mba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编

mba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编

本文介绍了是否期望numba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编码慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从(为什么这个numba代码比numpy代码慢6倍?),以便它可以处理x1为(n,m)

I modify the most efficient code from (Why this numba code is 6x slower than numpy code?) so that it can handle x1 being (n, m)

@nb.njit(fastmath=True,parallel=True)
def euclidean_distance_square_numba_v5(x1, x2):
    res = np.empty((x1.shape[0], x2.shape[0]), dtype=x2.dtype)
    for a_idx in nb.prange(x1.shape[0]):
        for o_idx in range(x2.shape[0]):
            val = 0.
            for i_idx in range(x2.shape[1]):
                tmp = x1[a_idx, i_idx] - x2[o_idx, i_idx]
                val += tmp * tmp
            res[a_idx, o_idx] = val
    return res

但是,效率更高的numpy版本仍然没有效率:

However, it is still not more efficient that the more efficient numpy's version:

def euclidean_distance_square_einsum(x1, x2):
    return np.einsum('ij,ij->i', x1, x1)[:, np.newaxis] + np.einsum('ij,ij->i', x2, x2) - 2*np.dot(x1, x2.T)

输入为

a = np.zeros((1000000,512), dtype=np.float32)
b = np.zeros((100, 512), dtype=np.float32)

我得到的时间是numba代码为2.4723422527313232和numpy代码为0.8260958194732666.

The timing I got is 2.4723422527313232 for the numba code and 0.8260958194732666 for the numpy code.

推荐答案

是的,这是预期的.

您必须意识到的第一件事:点积是numpy版本的主力军,这里是稍小的数组:

The first thing you must be aware of: dot-product is the working horse of the numpy-version, here for slightly smaller arrays:

>>> def only_dot(x1, x2):
        return - 2*np.dot(x1, x2.T)

>>> a = np.zeros((1000,512), dtype=np.float32)
>>> b = np.zeros((100, 512), dtype=np.float32)

>>> %timeit(euclidean_distance_square_einsum(a,b))
6.08 ms ± 312 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit(euclidean_only_dot(a,b))
5.25 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

即花费了85%的时间.

i.e. 85% of the time are spent in it.

当您查看numba代码时,它看起来像是矩阵矩阵乘法的某种奇怪/不寻常/更复杂的版本-例如,可以看到相同的三个循环.

When you look at your numba-code, that looks like a somewhat strange/unusual/more complicated version of matrix-matrix-multiplication - one could see for example the same three loops.

因此,基本上,您正在尝试击败最好的最佳算法之一.例如,这是有人尝试这样做并失败.我的安装使用Intel的MKL版本,该版本必须比默认实现更为复杂,默认实现可以在此处.

So basically, you are trying to beat one of the best optimized algorithms out there. Here is for example somebody trying to do it and failing. My installation uses Intel's MKL-version, which must be more sophisticated than the default-implementation, which can be found here.

有时候,在享受了整个乐趣之后,人们不得不承认自己的重新发明的轮子"不如最新的轮子好……但是只有这样,人们才能真正欣赏它的性能.

Sometimes, after the whole fun one had, one has to admit that the own "reinvented wheel" is not as good as the state-of-the-art wheel... but only then one can truly appreciate its performance.

这篇关于是否期望numba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编码慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 06:01