本文介绍了Numba安全版本的itertools.combinations?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码循环遍历大量的 itertools.combinations
现在是性能瓶颈。我正在尝试使用 numba @jit(nopython = True)加快速度,但是我遇到了一些问题。

I have some code which loops through a large set of itertools.combinations,which is now a performance bottleneck. I'm trying to turn to numba's @jit(nopython=True) to speed it up, but I'm running into some issues.

首先,看来numba本身无法处理 itertools.combinations 本身,

First, it seems numba can't handle itertools.combinations itself, per this small example:

import itertools
import numpy as np
from numba import jit

arr = [1, 2, 3]
c = 2

@jit(nopython=True)
def using_it(arr, c):
    return itertools.combinations(arr, c)

for i in using_it(arr, c):
    print(i)

抛出错误: numba.errors.TypingError:在nopython模式管道中失败(步骤:nopython前端)
类型为Module(< module')的未知属性'combinations' itertools'(内置)>)

经过一番搜索,我发现了,其中发问者建议使用此numba安全函数来计算排列:

After some googling, I found this github issue where the questioner proposed this numba-safe function for calculating permutations:

@jit(nopython=True)
def permutations(A, k):
    r = [[i for i in range(0)]]
    for i in range(k):
        r = [[a] + b for a in A for b in r if (a in b)==False]
    return r

然后,我可以轻松地筛选出以下组合:

Leveraging that, I can then easily filter down to combinations:

@jit(nopython=True)
def combinations(A, k):
    return [item for item in permutations(A, k) if sorted(item) == item]

现在我可以运行组合函数无错误,并获得正确的结果。但是,现在使用 @jit(nopython = True)的速度要比不使用它慢得多。运行此计时测试:

Now I can run that combinations function without errors and get the correct result. However, this is now dramatically slower with the @jit(nopython=True) than without it. Running this timing test:

A = list(range(20))  # numba throws 'cannot determine numba type of range' w/o list
k = 2
start = pd.Timestamp.utcnow()
print(combinations(A, k))
print(f"took {pd.Timestamp.utcnow() - start}")

在2.6秒时使用 numba计时@jit(nopython = True)装饰器,并用1/000秒以下的时间将其注释掉。因此,对于我来说,这也不是一个切实可行的解决方案。

clocks in at 2.6 seconds with the numba @jit(nopython=True) decorators, and under 1/000 of a second with them commented out. So that's not really a workable solution for me either.

推荐答案

在这种情况下,使用Numba并没有太大收获,因为用C编写。

There is not much to gain with Numba in this case as itertools.combinations is written in C.

如果要对其进行基准测试,这是Numba / Python实现的 itertools工具。 combinatiions 确实:

If you want to benchmark it, here is a Numba / Python implementation of what itertools.combinatiions does:

@jit(nopython=True)
def using_numba(pool, r):
    n = len(pool)
    indices = list(range(r))
    empty = not(n and (0 < r <= n))

    if not empty:
        result = [pool[i] for i in indices]
        yield result

    while not empty:
        i = r - 1
        while i >= 0 and indices[i] == i + n - r:
            i -= 1
        if i < 0:
            empty = True
        else:
            indices[i] += 1
            for j in range(i+1, r):
                indices[j] = indices[j-1] + 1

            result = [pool[i] for i in indices]
            yield result

在我的机器上,这比 itertools.combinations 慢15倍。获取排列并过滤组合肯定会更慢。

On my machine, this is about 15 times slower than itertools.combinations. Getting the permutations and filtering the combinations would certainly be even slower.

这篇关于Numba安全版本的itertools.combinations?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-15 07:29