假设我仅定义了以下函数以创建问题所在的函数拓扑:
def foo(x,y):
return np.asarray([x for i in range(y)])
bar = lambda x: foo(x,10)
barv = np.vectorize(bar)
z = np.asarray([1, 2, 3])
和以下例程:
for i in range(z.shape[0]):
rng = np.arange(z[i],100)
# res = barv(rng)
res = np.asarray(list(map(bar,rng)))
上述常规工作。但是,如果我取消注释并运行矢量化版本,即:
for i in range(z.shape[0]):
rng = np.arange(z[i],100)
res = barv(rng)
代码失败,并出现以下错误:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-14-660195661e55>", line 3, in <module>
res = barv(rng)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2091, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2170, in _vectorize_call
res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.
该错误是有道理的。但是,必须有某种方式在numpy中进行矢量化的1:many操作吗?
最佳答案
vectorize
用于与scalar
函数一起使用,这些函数采用标量输入,并返回标量输出。
In [729]: foo(z,10)
Out[729]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
In [730]: bar(z)
Out[730]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
如果给定1d输入,您的
bar
将返回2d数组。如果给出标量输入,则为一维数组In [734]: bar(4)
Out[734]: array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
我们可以告诉
vectorize
期望object
返回,In [735]: barv = np.vectorize(bar, otypes=[object])
In [736]: barv(4)
Out[736]: array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4], dtype=object)
In [737]: barv(z)
Out[737]:
array([array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])], dtype=object)
可以将其转换为2d数组:
In [738]: np.stack(_)
Out[738]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
vectorize
还有一个签名参数,在这种情况下可能会有所帮助。但是以我的经验,它甚至更慢。但是我们在这里不需要
vectorize
-简单的列表理解也一样,可能更好:In [739]: np.stack([bar(i) for i in z])
Out[739]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
vectorize
使使用多个输入数组的“广播”更加容易,但是并不能提高速度。看看是否可以用vectorize
弄清楚foo
在做什么:In [743]: f = np.vectorize(foo, otypes=[object])
In [744]: f(np.array([1,2,3]), np.array([2,3,4]))
Out[744]: array([array([1, 1]), array([2, 2, 2]), array([3, 3, 3, 3])], dtype=object)
In [745]: f(np.array([1,2,3]), np.array([[2],[3]]))
Out[745]:
array([[array([1, 1]), array([2, 2]), array([3, 3])],
[array([1, 1, 1]), array([2, 2, 2]), array([3, 3, 3])]],
dtype=object)
编辑
正确的
numpy
向量化:In [762]: np.repeat(z[:,None],10,1)
Out[762]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
一些时间比较:
In [766]: timeit np.stack(barv(z))
60 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [767]: timeit np.stack([bar(i) for i in z])
39.6 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [768]: timeit np.repeat(z[:,None],10,1)
4.12 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
您的
foo
已经可以使用数组输入。不需要np.vectorize
包装器。In [783]: foo(z,10).T
Out[783]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
In [784]: timeit foo(z,10).T
10.8 µs ± 364 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)