问题描述
我有一些代码循环遍历大量的 itertools.combinations
,
现在是性能瓶颈。我正在尝试使用 numba
的 @jit(nopython = True)
加快速度,但是我遇到了一些问题。
I have some code which loops through a large set of itertools.combinations
,which is now a performance bottleneck. I'm trying to turn to numba
's @jit(nopython=True)
to speed it up, but I'm running into some issues.
首先,看来numba本身无法处理 itertools.combinations
本身,
First, it seems numba can't handle itertools.combinations
itself, per this small example:
import itertools
import numpy as np
from numba import jit
arr = [1, 2, 3]
c = 2
@jit(nopython=True)
def using_it(arr, c):
return itertools.combinations(arr, c)
for i in using_it(arr, c):
print(i)
抛出错误: numba.errors.TypingError:在nopython模式管道中失败(步骤:nopython前端)
类型为Module(< module')的未知属性'combinations' itertools'(内置)>)
经过一番搜索,我发现了,其中发问者建议使用此numba安全函数来计算排列:
After some googling, I found this github issue where the questioner proposed this numba-safe function for calculating permutations:
@jit(nopython=True)
def permutations(A, k):
r = [[i for i in range(0)]]
for i in range(k):
r = [[a] + b for a in A for b in r if (a in b)==False]
return r
然后,我可以轻松地筛选出以下组合:
Leveraging that, I can then easily filter down to combinations:
@jit(nopython=True)
def combinations(A, k):
return [item for item in permutations(A, k) if sorted(item) == item]
现在我可以运行组合
函数无错误,并获得正确的结果。但是,现在使用 @jit(nopython = True)
的速度要比不使用它慢得多。运行此计时测试:
Now I can run that combinations
function without errors and get the correct result. However, this is now dramatically slower with the @jit(nopython=True)
than without it. Running this timing test:
A = list(range(20)) # numba throws 'cannot determine numba type of range' w/o list
k = 2
start = pd.Timestamp.utcnow()
print(combinations(A, k))
print(f"took {pd.Timestamp.utcnow() - start}")
在2.6秒时使用 numba计时@jit(nopython = True)
装饰器,并用1/000秒以下的时间将其注释掉。因此,对于我来说,这也不是一个切实可行的解决方案。
clocks in at 2.6 seconds with the numba @jit(nopython=True)
decorators, and under 1/000 of a second with them commented out. So that's not really a workable solution for me either.
推荐答案
在这种情况下,使用Numba并没有太大收获,因为用C编写。
There is not much to gain with Numba in this case as itertools.combinations
is written in C.
如果要对其进行基准测试,这是Numba / Python实现的 itertools工具。 combinatiions
确实:
If you want to benchmark it, here is a Numba / Python implementation of what itertools.combinatiions
does:
@jit(nopython=True)
def using_numba(pool, r):
n = len(pool)
indices = list(range(r))
empty = not(n and (0 < r <= n))
if not empty:
result = [pool[i] for i in indices]
yield result
while not empty:
i = r - 1
while i >= 0 and indices[i] == i + n - r:
i -= 1
if i < 0:
empty = True
else:
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
result = [pool[i] for i in indices]
yield result
在我的机器上,这比 itertools.combinations
慢15倍。获取排列并过滤组合肯定会更慢。
On my machine, this is about 15 times slower than itertools.combinations
. Getting the permutations and filtering the combinations would certainly be even slower.
这篇关于Numba安全版本的itertools.combinations?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!