本文介绍了矩阵向量差的有效逐元argmin的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设一个数组a.shape == (N, M)和一个向量v.shape == (N,).目标是计算从a每个元素减去的vabsargmin-即

Suppose an array a.shape == (N, M) and a vector v.shape == (N,). The goal is to compute argmin of abs of v subtracted from every element of a - that is,

out = np.zeros(N, M)
for i in range(N):
    for j in range(M):
        out[i, j] = np.argmin(np.abs(a[i, j] - v))

我有一个向量化的实现,并且速度更快,但占用O(M*N^2)内存,这在实践中是不可接受的.计算仍在CPU上完成,因此最好的选择似乎是在C语言中实现for循环作为扩展,但是也许Numpy已经实现了此逻辑.

I have a vectorized implementation via np.matlib.repmat, and it's much faster, but takes O(M*N^2) memory, unacceptable in practice. Computation's still done on CPU so best bet seems to be implementing the for-loop in C as an extension, but maybe Numpy already has this logic implemented.

是吗?上面有效地实现了任何可使用的Numpy函数吗?

Does it? Any use-ready Numpy functions implementing above efficiently?

推荐答案

this post 的启发,我们可以利用 np.searchsorted -

def find_closest(a, v):
    sidx = v.argsort()
    v_s = v[sidx]
    idx = np.searchsorted(v_s, a)
    idx[idx==len(v)] = len(v)-1
    idx0 = (idx-1).clip(min=0)
    
    m = np.abs(a-v_s[idx]) >= np.abs(v_s[idx0]-a)
    m[idx==0] = 0
    idx[m] -= 1
    out = sidx[idx]
    return out

更多性能.在大型数据集上使用numexpr进行增强:

Some more perf. boost with numexpr on large datasets :

import numexpr as ne

def find_closest_v2(a, v):
    sidx = v.argsort()
    v_s = v[sidx]
    idx = np.searchsorted(v_s, a)
    idx[idx==len(v)] = len(v)-1
    idx0 = (idx-1).clip(min=0)
    
    p1 = v_s[idx]
    p2 = v_s[idx0]
    m = ne.evaluate('(idx!=0) & (abs(a-p1) >= abs(p2-a))', {'p1':p1, 'p2':p2, 'idx':idx})
    idx[m] -= 1
    out = sidx[idx]
    return out

时间

设置:

N,M = 500,100000
a = np.random.rand(N,M)
v = np.random.rand(N)

In [22]: %timeit find_closest_v2(a, v)
4.35 s ± 21.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [23]: %timeit find_closest(a, v)
4.69 s ± 173 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

这篇关于矩阵向量差的有效逐元argmin的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 03:09