问题描述
我使用的列表大小不一.例如,alternativesList可以在一个迭代中包含4个列表,而在另一个迭代中包含7个列表.
I am using a list of list with varying sizes. For example alternativesList can include 4 lists in one iteration and 7 lists in the other.
我想做的是捕获不同列表中单词的每个组合.
What i am trying to do is capture every combination of words in different lists.
我们这么说
a= [1,2,3]
alternativesList.append(a)
b = ["a","b","c"]
alternativesList.append(b)
productList = itertools.product(*alternativesList)
将创建
[(1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(2,'c' ),(3,'a'),(3,'b'),(3,'c')]
[(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'), (3, 'a'), (3, 'b'), (3, 'c')]
这里的一个问题是我的productList太大了,可能导致内存问题.因此,我将productList作为对象使用,并在以后对其进行迭代.
One problem here is that my productList can be so large it can cause memory problems. So i am using productList as object and iterate over it later.
我想知道的是,有没有一种方法可以使用numpy创建相同的对象,而该对象的工作速度比itertools快?
What i want to know is that is there a way to create same object with numpy which works faster than itertools?
推荐答案
您可以通过显式指定复合dtype来避免numpy尝试查找catchall dtype引起的一些问题:
You can avoid some problems arising from numpy trying to find catchall dtype by explicitly specifying a compound dtype:
代码+一些时间:
import numpy as np
import itertools
def cartesian_product_mixed_type(*arrays):
arrays = *map(np.asanyarray, arrays),
dtype = np.dtype([(f'f{i}', a.dtype) for i, a in enumerate(arrays)])
out = np.empty((*map(len, arrays),), dtype)
idx = slice(None), *itertools.repeat(None, len(arrays) - 1)
for i, a in enumerate(arrays):
out[f'f{i}'] = a[idx[:len(arrays) - i]]
return out.ravel()
a = np.arange(4)
b = np.arange(*map(ord, ('A', 'D')), dtype=np.int32).view('U1')
c = np.arange(2.)
np.set_printoptions(threshold=10)
print(f'a={a}')
print(f'b={b}')
print(f'c={c}')
print('itertools')
print(list(itertools.product(a,b,c)))
print('numpy')
print(cartesian_product_mixed_type(a,b,c))
a = np.arange(100)
b = np.arange(*map(ord, ('A', 'z')), dtype=np.int32).view('U1')
c = np.arange(20.)
import timeit
kwds = dict(globals=globals(), number=1000)
print()
print(f'a={a}')
print(f'b={b}')
print(f'c={c}')
print(f"itertools: {timeit.timeit('list(itertools.product(a,b,c))', **kwds):7.4f} ms")
print(f"numpy: {timeit.timeit('cartesian_product_mixed_type(a,b,c)', **kwds):7.4f} ms")
a = np.arange(1000)
b = np.arange(1000, dtype=np.int32).view('U1')
print()
print(f'a={a}')
print(f'b={b}')
print(f"itertools: {timeit.timeit('list(itertools.product(a,b))', **kwds):7.4f} ms")
print(f"numpy: {timeit.timeit('cartesian_product_mixed_type(a,b)', **kwds):7.4f} ms")
示例输出:
a=[0 1 2 3]
b=['A' 'B' 'C']
c=[0. 1.]
itertools
[(0, 'A', 0.0), (0, 'A', 1.0), (0, 'B', 0.0), (0, 'B', 1.0), (0, 'C', 0.0), (0, 'C', 1.0), (1, 'A', 0.0), (1, 'A', 1.0), (1, 'B', 0.0), (1, 'B', 1.0), (1, 'C', 0.0), (1, 'C', 1.0), (2, 'A', 0.0), (2, 'A', 1.0), (2, 'B', 0.0), (2, 'B', 1.0), (2, 'C', 0.0), (2, 'C', 1.0), (3, 'A', 0.0), (3, 'A', 1.0), (3, 'B', 0.0), (3, 'B', 1.0), (3, 'C', 0.0), (3, 'C', 1.0)]
numpy
[(0, 'A', 0.) (0, 'A', 1.) (0, 'B', 0.) ... (3, 'B', 1.) (3, 'C', 0.)
(3, 'C', 1.)]
a=[ 0 1 2 ... 97 98 99]
b=['A' 'B' 'C' ... 'w' 'x' 'y']
c=[ 0. 1. 2. ... 17. 18. 19.]
itertools: 7.4339 ms
numpy: 1.5701 ms
a=[ 0 1 2 ... 997 998 999]
b=['' '\x01' '\x02' ... 'ϥ' 'Ϧ' 'ϧ']
itertools: 62.6357 ms
numpy: 8.0249 ms
这篇关于numpy替代itertools产品的Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!