问题描述
我了解itertools.product
用于迭代关键字的多个维度的列表.例如,如果我有这个:
I know about itertools.product
for iterating on a list of several dimensions of keywords. For instance if I have this:
categories = [
[ 'A', 'B', 'C', 'D'],
[ 'E', 'F', 'G', 'H'],
[ 'I', 'J', 'K', 'L']
]
,在它上面使用itertools.product()
,我有类似的东西:
and I use itertools.product()
over it, I have something like:
>>> [ x for x in itertools.product(*categories) ]
('A', 'E', 'I'),
('A', 'E', 'J'),
('A', 'E', 'K'),
('A', 'E', 'L'),
('A', 'F', 'I'),
('A', 'F', 'J'),
# and so on...
有没有一种等效的,直接的方法可以对numpy
的数组执行相同的操作?
Is there an equivalent, straightforward way of doing the same thing with numpy
's arrays?
推荐答案
已经问了几次这个问题:
This question has been asked a couple of times already:
第一个链接具有一个有效的numpy解决方案,尽管没有提供基准测试,但据称它比itertools快几倍.这段代码是由一个名为pv的用户编写的.请点击链接,并在他认为有用的情况下支持他的回答:
The first link has a working numpy solution, that is claimed to be several times faster than itertools, though no benchmarks are provided. This code was written by a user named pv. Please, follow the link and support his answer if you find it useful:
import numpy as np
def cartesian(arrays, out=None):
"""
Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
1-D arrays to form the cartesian product of.
out : ndarray
Array to place the cartesian product in.
Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing cartesian products
formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
m = n / arrays[0].size
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m,1:])
for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m,1:] = out[0:m,1:]
return out
尽管如此,在同一篇文章中,Alex Martelli(他是SO的一位伟大的Python专家)写道,itertools是完成此任务的最快方法.因此,这是一个快速基准,证明了亚历克斯的话.
Nevertheless, in the same post Alex Martelli - he is a great Python guru at SO - wrote, that itertools was the fastest way to get this task done. So here is a quick benchmark, that proves Alex's words.
import numpy as np
import time
import itertools
def cartesian(arrays, out=None):
...
def test_numpy(arrays):
for res in cartesian(arrays):
pass
def test_itertools(arrays):
for res in itertools.product(*arrays):
pass
def main():
arrays = [np.fromiter(range(100), dtype=int), np.fromiter(range(100, 200), dtype=int)]
start = time.clock()
for _ in range(100):
test_numpy(arrays)
print(time.clock() - start)
start = time.clock()
for _ in range(100):
test_itertools(arrays)
print(time.clock() - start)
if __name__ == '__main__':
main()
输出:
0.421036
0.06742
因此,您绝对应该使用itertools.
So, you should definitely use itertools.
这篇关于相当于itertools.product的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!