问题描述
假设一个数组a.shape == (N, M)
和一个向量v.shape == (N,)
.目标是计算从a
的每个元素减去的v
的abs
的argmin
-即
Suppose an array a.shape == (N, M)
and a vector v.shape == (N,)
. The goal is to compute argmin
of abs
of v
subtracted from every element of a
- that is,
out = np.zeros(N, M)
for i in range(N):
for j in range(M):
out[i, j] = np.argmin(np.abs(a[i, j] - v))
我有一个向量化的实现,并且速度更快,但占用O(M*N^2)
内存,这在实践中是不可接受的.计算仍在CPU上完成,因此最好的选择似乎是在C语言中实现for循环作为扩展,但是也许Numpy已经实现了此逻辑.
I have a vectorized implementation via np.matlib.repmat
, and it's much faster, but takes O(M*N^2)
memory, unacceptable in practice. Computation's still done on CPU so best bet seems to be implementing the for-loop in C as an extension, but maybe Numpy already has this logic implemented.
是吗?上面有效地实现了任何可使用的Numpy函数吗?
Does it? Any use-ready Numpy functions implementing above efficiently?
推荐答案
受 this post
的启发,我们可以利用 np.searchsorted
-
def find_closest(a, v):
sidx = v.argsort()
v_s = v[sidx]
idx = np.searchsorted(v_s, a)
idx[idx==len(v)] = len(v)-1
idx0 = (idx-1).clip(min=0)
m = np.abs(a-v_s[idx]) >= np.abs(v_s[idx0]-a)
m[idx==0] = 0
idx[m] -= 1
out = sidx[idx]
return out
更多性能.在大型数据集上使用numexpr
进行增强:
Some more perf. boost with numexpr
on large datasets :
import numexpr as ne
def find_closest_v2(a, v):
sidx = v.argsort()
v_s = v[sidx]
idx = np.searchsorted(v_s, a)
idx[idx==len(v)] = len(v)-1
idx0 = (idx-1).clip(min=0)
p1 = v_s[idx]
p2 = v_s[idx0]
m = ne.evaluate('(idx!=0) & (abs(a-p1) >= abs(p2-a))', {'p1':p1, 'p2':p2, 'idx':idx})
idx[m] -= 1
out = sidx[idx]
return out
时间
设置:
N,M = 500,100000
a = np.random.rand(N,M)
v = np.random.rand(N)
In [22]: %timeit find_closest_v2(a, v)
4.35 s ± 21.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [23]: %timeit find_closest(a, v)
4.69 s ± 173 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
这篇关于矩阵向量差的有效逐元argmin的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!