问题描述
我想生成一个按字典顺序排列的数字序列,以使每个数字的总和是一个给定的常数.它有点类似于子集和问题".例如,如果我想生成sum = 3的4位数字,那么我有一个像这样的序列:
I want to generate a lexicographic series of numbers such that for each number the sum of digits is a given constant. It is somewhat similar to 'subset sum problem'. For example if I wish to generate 4-digit numbers with sum = 3 then I have a series like:
[3 0 0 0]
[3 0 0 0]
[2 1 0 0]
[2 1 0 0]
[2 0 1 0]
[2 0 1 0]
[2 0 0 1]
[2 0 0 1]
[1 2 0 0] ...依此类推.
[1 2 0 0] ... and so on.
我能够使用以下代码在Python中成功完成此操作:
I was able to do it successfully in Python with the following code:
import numpy as np
M = 4 # No. of digits
N = 3 # Target sum
a = np.zeros((1,M), int)
b = np.zeros((1,M), int)
a[0][0] = N
jj = 0
while a[jj][M-1] != N:
ii = M-2
while a[jj][ii] == 0:
ii = ii-1
kk = ii
if kk > 0:
b[0][0:kk-1] = a[jj][0:kk-1]
b[0][kk] = a[jj][kk]-1
b[0][kk+1] = N - sum(b[0][0:kk+1])
b[0][kk+2:] = 0
a = np.concatenate((a,b), axis=0)
jj += 1
for ii in range(0,len(a)):
print a[ii]
print len(a)
我认为这不是一种非常有效的方法(因为我是Python新手).对于较小的M和N(< 10),它可以正常工作,但超出此范围的速度确实很慢.我希望将其用于M〜100和N〜6.如何使我的代码更高效,或者有更好的编码方法?
I don't think it is a very efficient way (as I am a Python newbie). It works fine for small values of M and N (<10) but really slow beyond that. I wish to use it for M ~ 100 and N ~ 6. How can I make my code more efficient or is there a better way to code it?
推荐答案
非常有效的算法,改编自Jorg Arndt的著作《 Matters Computational》
(第7.2 Co-lexicographic order for compositions into exactly k parts
章)
Very effective algorithm adapted from Jorg Arndt book "Matters Computational"
(Chapter 7.2 Co-lexicographic order for compositions into exactly k parts
)
n = 4
k = 3
x = [0] * n
x[0] = k
while True:
print(x)
v = x[-1]
if (k==v ):
break
x[-1] = 0
j = -2
while (0==x[j]):
j -= 1
x[j] -= 1
x[j+1] = 1 + v
[3, 0, 0, 0]
[2, 1, 0, 0]
[2, 0, 1, 0]
[2, 0, 0, 1]
[1, 2, 0, 0]
[1, 1, 1, 0]
[1, 1, 0, 1]
[1, 0, 2, 0]
[1, 0, 1, 1]
[1, 0, 0, 2]
[0, 3, 0, 0]
[0, 2, 1, 0]
[0, 2, 0, 1]
[0, 1, 2, 0]
[0, 1, 1, 1]
[0, 1, 0, 2]
[0, 0, 3, 0]
[0, 0, 2, 1]
[0, 0, 1, 2]
[0, 0, 0, 3]
对于普通的Python(也许numpy数组更快),n = 100,k = 2,3,4,5(2.8 ghz Cel-1840),其成分数和时间以秒为单位
Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)
2 5050 0.040000200271606445
3 171700 0.9900014400482178
4 4421275 20.02204465866089
5 91962520 372.03577995300293
I expect time 2 hours for 100/6 generation
与numpy数组(x = np.zeros((n,), dtype=int)
)相同会给出更糟糕的结果-但这也许是因为我不知道如何正确使用它们
Same with numpy arrays (x = np.zeros((n,), dtype=int)
) gives worse results - but perhaps because I don't know how to use them properly
2 5050 0.07999992370605469
3 171700 2.390003204345703
4 4421275 54.74532389640808
本机代码(这是Delphi,C/C ++编译器可能会优化得更好)在21秒内生成100/6
Native code (this is Delp C/C++ compilers might optimize better) generates 100/6 in 21 seconds
3 171700 0.012
4 4421275 0.125
5 91962520 1.544
6 1609344100 20.748
在所有测量未完成之前无法入睡:)
Cannot go sleep until all measurements aren't done :)
MSVS VC ++: 18秒! (优化氧气)
MSVS VC++: 18 seconds! (O2 optimization)
5 91962520 1.466
6 1609344100 18.283
因此每秒有1亿个变体.检查空单元会浪费大量时间(因为填充率很小).更高的k/n比率可以达到Arndt所描述的速度,并且每秒大约有300-500百万个变体:
So 100 millions variants per second.A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:
n=25, k=15 25140840660 60.981 400 millions per second
这篇关于在Python中高效地生成词典序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!