假设我仅定义了以下函数以创建问题所在的函数拓扑:

def foo(x,y):
  return np.asarray([x for i in range(y)])

bar = lambda x: foo(x,10)
barv = np.vectorize(bar)

z = np.asarray([1, 2, 3])


和以下例程:

for i in range(z.shape[0]):
  rng = np.arange(z[i],100)
  # res = barv(rng)
  res = np.asarray(list(map(bar,rng)))


上述常规工作。但是,如果我取消注释并运行矢量化版本,即:

for i in range(z.shape[0]):
  rng = np.arange(z[i],100)
  res = barv(rng)


代码失败,并出现以下错误:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-660195661e55>", line 3, in <module>
    res = barv(rng)
  File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2091, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2170, in _vectorize_call
    res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.


该错误是有道理的。但是,必须有某种方式在numpy中进行矢量化的1:many操作吗?

最佳答案

vectorize用于与scalar函数一起使用,这些函数采用标量输入,并返回标量输出。

In [729]: foo(z,10)
Out[729]:
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])
In [730]: bar(z)
Out[730]:
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])


如果给定1d输入,您的bar将返回2d数组。如果给出标量输入,则为一维数组

In [734]: bar(4)
Out[734]: array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])


我们可以告诉vectorize期望object返回,

In [735]: barv = np.vectorize(bar, otypes=[object])
In [736]: barv(4)
Out[736]: array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4], dtype=object)
In [737]: barv(z)
Out[737]:
array([array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
       array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
       array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])], dtype=object)


可以将其转换为2d数组:

In [738]: np.stack(_)
Out[738]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])


vectorize还有一个签名参数,在这种情况下可能会有所帮助。但是以我的经验,它甚至更慢。

但是我们在这里不需要vectorize-简单的列表理解也一样,可能更好:

In [739]: np.stack([bar(i) for i in z])
Out[739]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])


vectorize使使用多个输入数组的“广播”更加容易,但是并不能提高速度。看看是否可以用vectorize弄清楚foo在做什么:

In [743]: f = np.vectorize(foo, otypes=[object])
In [744]: f(np.array([1,2,3]), np.array([2,3,4]))
Out[744]: array([array([1, 1]), array([2, 2, 2]), array([3, 3, 3, 3])], dtype=object)
In [745]: f(np.array([1,2,3]), np.array([[2],[3]]))
Out[745]:
array([[array([1, 1]), array([2, 2]), array([3, 3])],
       [array([1, 1, 1]), array([2, 2, 2]), array([3, 3, 3])]],
      dtype=object)


编辑

正确的numpy向量化:

In [762]: np.repeat(z[:,None],10,1)
Out[762]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])


一些时间比较:

In [766]: timeit np.stack(barv(z))
60 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [767]: timeit np.stack([bar(i) for i in z])
39.6 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [768]: timeit np.repeat(z[:,None],10,1)
4.12 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


您的foo已经可以使用数组输入。不需要np.vectorize包装器。

In [783]: foo(z,10).T
Out[783]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
In [784]: timeit foo(z,10).T
10.8 µs ± 364 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

09-25 19:15