本文介绍了创建约束的随机数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

清理文本:

如何创建M = 5的随机数的增加UPP来,说N = 100。但是,第一随机数是说,10其中; X1< 30,第二随机NR是5℃ X2< 20,第三随机NR是10 LT; X3< 25,等等。所以这五个随机数字加起来100。我怎样才能创建这些约束五个数字?

How can I create m=5 random numbers that add upp to, say n=100. But, the first random number is say, 10 < x1 < 30, the second random nr is 5 < x2 < 20, the third random nr is 10 < x3 < 25, etc. So these five random numbers add up to 100. How can I create these constrained five numbers?

[

相关问题A1):标准的方式来创建五个随机数,加起来就是100,就是样品之间[0,100]四个数字,并添加边界0和100,然后将这些六个数字[0排序,X1 ,X2,X3,x4,100]。五随机数我求,都是增量。即,

Related problem A1): The standard way to create five random numbers that add up to 100, is to sample four numbers between [0,100], and add the boundaries 0 and 100, and then sort these six numbers [0,x1,x2,x3,x4,100]. The five random numbers I seek, are the deltas. That is,

100 - x[4] = delta 5
x[4]- x[3] = delta 4
x[3]- x[2] = delta 3
x[2]- x[1] = delta 2
x[1] - 0   = delta 1

这些5增量现在将加起来100。例如,它们可能是0,1,2,7,90。下面是一些code,它解决了这个问题:

These five deltas will now add up to 100. For instance, they might be 0,1,2,7,90. Here is some code that solves this problem:

total_sum = 100
n = 5
v = numpy.random.multinomial(total_sum, numpy.ones(n)/n)

]]

有关我的问题,我不能让宽的间隔出现,上面最大的小号$ P $垫是90-7 = 83太宽。所以,我必须指定一个更严格的小号$ P $垫,说[10,30]。这意味着最大随机数是30,而不允许大S $ P $垫如83

For my problem, I can not allow wide intervals to occur, the largest spread above is 90-7 = 83 which is too wide. So, I have to specify a tighter spread, say [10,30]. This means the largest random number is 30, which disallows large spreads such as 83.

[

相关问题A2)的部分解决方案来创建五个数字与一样的边界,10 LT; x_i&LT; 30,这就增加了100是这样的:只要做到像A1),但加上下边界10,到增量。所以我得到的五个随机数字,我求是这样的:

Related problem A2): A partial solution to create five numbers with identical boundaries, 10 < x_i < 30, that adds up to 100 is like this: Just do like in A1) but add the lower boundary 10, to the deltas. So I get the five random numbers that I seek like this:

100 - x[4] = delta 5 + 10
x[4]- x[3] = delta 4 + 10
x[3]- x[2] = delta 3 + 10
x[2]- x[1] = delta 2 + 10
x[1] - 0   = delta 1 + 10

基本上,我做酷似在A1)中,但不从0开始,而是从10开始。因此,每个数字具有下边界10,但他们不具有上边界,也可以是大的,过大。如何限制到30中的上边界?这里的问题是如何限制上边界

Basically, I do exactly like in A1) but do not start from 0, but start from 10. Thus, each number has the lower boundary 10, but they dont have an upper boundary, it can be large, too large. How to limit the upper boundary to 30? Here the problem is how to limit the upper boundary

]]

要概括,我试图解决看起来像这样的问题的类型:我需5个随机数总计达100,我需要分别指定边界的每一个数字,说[10,30]为第一随机号,然后[5,10]的第二随机数,和[15,35]用于第三随机数等。并且它们必须全部加起来为100

To recapitulate, the type of the problem I try to solve looks like this: I need five random numbers adding up to 100 and I need to specify the boundaries separately for each number, say [10,30] for the first random number, and then [5,10] for the second random number, and [15,35] for the third random number, etc. And they must all add up to 100.

但是,真正的数据我使​​用的,有〜100号x_i(M = 50),所有这些加起来的说〜40万人。和范围通常[3000,5000]为多个x_i。这些数字是不是真的准确,我只是想传达一些关于问题的规模。的目的是做一个MCMC仿真所以这些数字需要快速产生。人建议非常优雅的解决方案,真正做的工作,但他们要花很长的时间,所以我不能使用它们。这个问题仍然没有解决。理想情况下,我想一个O(M)解决方案和O(1)的存储解决方案。

But the real data I am using, has ~100 numbers x_i (m=50), all of them adding up to say ~400,000. And the range is typically [3000,5000] for a number x_i. These numbers are not really accurate, I am only trying to convey something about the problem size. The purpose is to do a MCMC simulation so these numbers need to be quickly generated. People have suggested very elegant solutions that really do work, but they take too long time, so I can not use them. The problem is still unsolved. Ideally I would like an O(m) solution and O(1) memory solution.

这个问题不应该是NP难,它不喜欢它的感觉。应该有一个多项式时间的解决方案,对吧?

This problem should not be NP-hard, it doesnt feel like it. There should be a polynomial time solution, right?

推荐答案

假设你需要N_1 [10,30],N_2在[20,40],n_3在[30,50]和N1 + N2 + N3 = 90

Suppose you need n_1 in [10,30], n_2 in [20,40], n_3 in [30,50] and n1+n2+n3=90

如果你需要每个可能的三元组(N_1,N_2,n_3)是相等的可能性,这将是困难的。形式(20 N_2,n_3)的三元组的数量大于三元组(10,N_2,n_3)的数量,所以你不能只挑N_1均匀。

If you need each possible triplet (n_1, n_2, n_3) to be equally-likely, that's going to be difficult. The number of triples of the form (20, n_2, n_3) is greater than the number of triples (10, n_2, n_3), so you can't just pick n_1 uniformly.

令人难以置信的慢,但准确的方法是产生所有5偶合在正确的范围内,拒绝整个集团如果和是不正确的。

The incredibly slow but accurate way is to generate the all 5 randoms in the correct ranges and reject the whole group if the sum is not correct.

我找到了一种有效的参数化的选择。第一,虽然为了简单起见注意该低边界的总和为最小的可能总和。如果减去低范围的总和从目标数,减去每一个生成的号码绑定​​低,你会得到一个问题,即每个号码在区间[0,max_k-min_k。这简化了数学和阵列(列表)处理。让n_k是基于0的选择与0℃= n_k&LT; = max_k-min_k

I found a way to parametrize the choice effectively. First, though, for simplicity note that the sum of the low bounds is the minimum possible sum. If subtract the sum of the low bounds from the target number and subtract the low bound from each generated number, you get a problem where each number is in the interval [0, max_k-min_k]. That simplifies the math and array (list) handling. Let n_k be the 0-based choice with 0<=n_k<=max_k-min_k.

的总和的顺序是词典,与所有的款项开头的N_1 = 0(如果有的话),然后再N_1 == 1求和等的款项被N_2在每个这些基团的排序,然后通过n_3,和不久。如果你知道有多少款项添加到目标(称之为T),以及有多少款项与N_1 = 0,1,2开始,...那么你可以找到和数起始编号N1 S在该列表中。然后你就可以减少问题,以增加N_2 + n_3 + ...获得T-N_1,发现和数的S - 。(数字原金额开始数少于N_1)

The order of the sums is lexicographic, with all sums beginning with n_1=0 (if any) first, then n_1==1 sums, etc. Sums are sorted by n_2 in each of those groups, then by n_3, and so on. If you know how many sums add to the target (call that T), and how many sums start with n_1=0, 1, 2, ... then you can find the starting number n1 of sum number S in in that list. Then you can reduce the problem to adding n_2+n_3+... to get T-n_1, finding sum number S - (number original sums starting with number less than n_1).

让脉冲(N)是N + 1的人的列表:(N + 1)* [1]在Python条款。让max_k,min_k是限制第k选择,和m_k = max_k-min_k对于基于0的选择的上限。然后有1 + M_1不同于第一数目的选择总和,和脉冲(m_k)给出的分布:1是使每个和从0到M_1。对于前两个选择,有M_1 + M_ + 1的不同的款项。事实证明,脉冲(M_1)与脉冲(M_2)的卷积给出的分布情况。

Let pulse(n) be a list of n+1 ones: (n+1)*[1] in Python terms. Let max_k,min_k be the limits for the k'th choice, and m_k = max_k-min_k be the upper limit for 0-based choices. Then there are 1+m_1 different "sums" from the choice of the first number, and pulse(m_k) gives the distribution: 1 was to make each sum from 0 to m_1. For the first two choices, there are m_1+m_+1 different sums. It turns out that the convolution of pulse(m_1) with pulse(m_2) gives the distribution.

时间停止一段code:

Time to stop for some code:

    def pulse(width, value=1):
        ''' Returns a vector of (width+1) integer ones. '''
        return (width+1)*[value]

    def stepconv(vector, width):
        ''' Computes the discrete convolution of vector with a "unit"
            pulse of given width.

            Formula: result[i] = Sum[j=0 to width] 1*vector[i-j]
            Where 0 <= i <= len(vector)+width-1, and the "1*" is the value
            of the implied unit pulse function: pulse[j] = 1 for 0<=j<=width.
        '''
        result = width*[0] + vector;
        for i in range(len(vector)):
            result[i] = sum(result[i:i+width+1])
        for i in range(len(vector), len(result)):
            result[i] = sum(result[i:])
        return result

这是codeD专为只是做卷积用脉冲阵列,所以每次在卷积线性组合仅仅是一个总和。

That's coded specifically for only doing convolutions with a "pulse" array, so every linear combination in the convolution is just a sum.

这些仅用于在最终级的解决方案的构造:

Those are used only in the constructor of the final class solution:

class ConstrainedRandom(object):
    def __init__(self, ranges=None, target=None, seed=None):
        self._rand = random.Random(seed)
        if ranges != None: self.setrange(ranges)
        if target != None: self.settarget(target)

    def setrange(self, ranges):
        self._ranges = ranges
        self._nranges = len(self._ranges)
        self._nmin, self._nmax = zip(*self._ranges)
        self._minsum = sum(self._nmin)
        self._maxsum = sum(self._nmax)
        self._zmax = [y-x for x,y in self._ranges]
        self._rconv = self._nranges * [None]
        self._rconv[-1] = pulse(self._zmax[-1])
        for k in range(self._nranges-1, 0, -1):
            self._rconv[k-1] = stepconv(self._rconv[k], self._zmax[k-1])

    def settarget(self, target):
        self._target = target

    def next(self, target=None):
        k = target if target != None else self._target
        k = k - self._minsum;
        N = self._rconv[0][k]
        seq = self._rand.randint(0,N-1)
        result = self._nranges*[0]
        for i in range(len(result)-1):
            cv = self._rconv[i+1]
            r_i = 0
            while k >= len(cv):
                r_i += 1
                k -= 1
            while cv[k] <= seq:
                seq -= cv[k]
                r_i += 1
                k -= 1
            result[i] = r_i
        result[-1] = k # t
        return [x+y for x,y in zip(result, self._nmin)]

    # end clss ConstrainedRandom

使用与:

ranges = [(low, high), (low, high), ...]
cr = ConstrainedRandom(ranges, target)
seq = cr.next();
print(seq)
assert sum(seq)==target

seq = cr.next(); # get then get the next one.

...等。这个类可以下调了一点,但主要的空间开销在_rconv名单,其中有存储的回旋。这是大约N * T / 2,为O(NT)存储。

...etc. The class could be trimmed down a bit, but the main space overhead is in the _rconv list, which has the stored convolutions. That's roughly N*T/2, for O(NT) storage.

在卷积只使用范围,用了很多具有相同的约束产生偶合的,表施工时间摊销走到零。的.next的时间复杂度()是大致T / 2,平均和O(T)时,在索引数量成_rconv列表的条款。

The convolutions only use the ranges, with a lot of randoms generated with the same constraints, the table construction time "amortizes away" to zero. The time complexity of .next() is roughly T/2 on average and O(T), in terms of the number of indexes into the _rconv lists.

要了解如何算法的工作原理,假设3从零开始的选择顺序,以最大值(5,7,3),以及基于0的目标T = 10。定义或输入脉冲和stepconv功能于一身的空闲会话,然后:

To see how the algorithm works, assume a sequence of 3 zero-based choices, with max values (5,7,3), and a 0-based target T=10. Define or import the pulse and stepconv functions in an Idle session, then:

>>> pulse(5)
[1, 1, 1, 1, 1, 1]
>>> K1 = pulse (5)
>>> K2 = stepconv(K1, 7)
>>> K3 = stepconv(K2, 3)
>>> K1
[1, 1, 1, 1, 1, 1]
>>> K2
[1, 2, 3, 4, 5, 6, 6, 6, 5, 4, 3, 2, 1]
>>> K3
[1, 3, 6, 10, 14, 18, 21, 23, 23, 21, 18, 14, 10, 6, 3, 1]
>>> K3[10]
18
>>> sum(K3)
192
>>> (5+1)*(7+1)*(3+1)
192

K3 [I]表示不同的选择N_1,N_2,n_3的数量,使得0℃= n_k&其中; = m_k和&Sigma公司; n_k =我。当应用于两个这样的名单让*平均卷积。然后脉冲(M_2)*脉冲(m_3)是给出N_2和n_3的款项的分配:

K3[i] shows the number of different choice n_1, n_2, n_3 such that 0 <= n_k <= m_k and Σ n_k = i. Letting * mean convolution when applied to two of these lists. Then pulse(m_2)*pulse(m_3) is gives the distribution of sums of n_2 and n_3:

>>> R23 = stepconv(pulse(7),3)
>>> R23
[1, 2, 3, 4, 4, 4, 4, 4, 3, 2, 1]
>>> len(R23)
11

这0至T的每个值= 10是(勉强)可能的,因此任何的选择是可能的第一数目和有R23 [T-N_1]可能三胞胎增加至T = 10开始的N1。所以,一旦你发现,有18个可能的总和增加10,生成一个随机数S = randint(18),并通过R23 [T:T-m_1-1:-1]倒计时数组:

Every value from 0 to T=10 is (barely) possible, so any choice is possible for the first number and there are R23[T-n_1] possible triplets adding to T=10 that start with N1. So, once you've found that there are 18 possible sums adding to 10, generate a random number S = randint(18) and count down through the R23[T:T-m_1-1:-1] array:

>>> R23[10:10-5-1:-1]
[1, 2, 3, 4, 4, 4]
>>> sum(R23[10:10-5-1:-1])
18

请注意该列表的总和为[10]上面计算中K3的总和。完整性检查。无论如何,若S == 9是随机选择,那么如何找到该阵列的许多领先的条款可以在不超过S.这是N_1值相加。在这种情况下,1 + 2 + 3'; = S但1 + 2 + 3 + 4为H. S,所以N_1为3。

Note the sum of that list is the total computed in K3[10] above. A sanity check. Anyway, if S==9 was the random choice, then find how many leading terms of that array can be summed without exceeding S. That's the value of n_1. In this case 1+2+3 <= S but 1+2+3+4 > S, so n_1 is 3.

如上所述,则可以减少问题找到N_2。最后的数字(n_3在这个例子中)将被唯一地确定。

As described above, you can then reduce the problem to find n_2. The final number (n_3 in this example) will be uniquely determined.

这篇关于创建约束的随机数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 16:25