问题描述
如果我有一个未排序的大集合 n
个整数(例如其中 2 ^ 20
个),并且想要生成每个元素都具有 k
个元素的子集(其中 k
很小,例如 5
)按其总和递增的顺序,最有效的方法是什么?
If I have an unsorted large set of n
integers (say 2^20
of them) and would like to generate subsets with k
elements each (where k
is small, say 5
) in increasing order of their sums, what is the most efficient way to do so?
为什么要以这种方式生成这些子集,是因为我想找到具有满足特定条件的最小和的k元素子集,并且因此,我将条件应用于生成的每个k元素子集。
此外,算法的复杂度是多少?
Also, what would be the complexity of the algorithm?
这里有一个类似的问题:按产品顺序排列,但由于集合 n
There is a similar question here: Algorithm to get every possible subset of a list, in order of their product, without building and sorting the entire list (i.e Generators) about generating subsets in order of their product, but it wouldn't fit my needs due to the extremely large size of the set n
我打算在Mathematica中实现该算法,但也可以在C ++或Python中实现。
I intend to implement the algorithm in Mathematica, but could do it in C++ or Python too.
推荐答案
如果需要的属性小潜艇的ts(将其称为 P
)非常普遍,一种概率方法可能效果很好:
If your desired property of the small subsets (call it P
) is fairly common, a probabilistic approach may work well:
- 将
n
个整数排序(对于数百万个整数,即ram的10s至100s MB,这应该不成问题),然后求和k-1
最小。称此总计偏移量
。 - 生成随机的
k
子集(例如,通过采样k
个随机数,modn
)并检查它的P
-ness。 - 在比赛中,记下子集的总和。从中减去
偏移量
可以找到等效总和的任何k
子集中最大元素的上限。 - 将您的
n
个整数限制为小于或等于此界限的整数。 - 重复(转到2),直到在固定的迭代次数内未找到匹配项为止。
- Sort the
n
integers (for millions of integers i.e. 10s to 100s of MB of ram, this should not be a problem), and sum thek-1
smallest. Call this totaloffset
. - Generate a random
k
-subset (say, by samplingk
random numbers, modn
) and check it forP
-ness. - On a match, note the sum-total of the subset. Subtract
offset
from this to find an upper bound on the largest element of anyk
-subset of equivalent sum-total. - Restrict your set of
n
integers to those less than or equal to this bound. - Repeat (goto 2) until no matches are found within some fixed number of iterations.
请注意,初始排序为 O(n log n)
。在第4步中隐含的二进制搜索为 O(log n)
。
Note the initial sort is O(n log n)
. The binary search implicit in step 4 is O(log n)
.
显然,如果 P
如此稀有,以至于随机击球不可能获得匹配,这对你没有好处。
Obviously, if P
is so rare that random pot-shots are unlikely to get a match, this does you no good.
这篇关于按总和顺序生成k个元素子集的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!