使用Cupy从GPU上的另一个矩阵创建距离矩阵

本文介绍了使用Cupy从GPU上的另一个矩阵创建距离矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经使用numpy编写了代码，该代码采用大小为(m x n)的数组...行(m)是由(n)个特征组成的单个观察结果...并且创建了大小为(m x m)的平方距离矩阵.该距离矩阵是给定观察值与所有其他观察值的距离.例如.第0行第9列是观测值0与观测值9之间的距离.

I have written code using numpy that takes an array of size (m x n)... The rows (m) are individual observations comprised of (n) features... and creates a square distance matrix of size (m x m). This distance matrix is the distance of a given observation from all other observations. E.g. row 0 column 9 is the distance between observation 0 and observation 9.

import numpy as np
#import cupy as np

def l1_distance(arr):
    return np.linalg.norm(arr, 1)

X = np.random.randint(low=0, high=255, size=(700,4096))

distance = np.empty((700,700))

for i in range(700):
    for j in range(700):
        distance[i,j] = l1_distance(X[i,:] - X[j,:])

我通过注释第二条import语句在cupy上尝试在GPU上进行此操作，但显然double for循环的效率极低.它需要大约numpy大约6秒，但是cupy需要26秒.我知道为什么会这样，但是我现在还不清楚如何并行化此过程.

I attempted this on GPU using cupy by umcommenting the second import statement, but obviously the double for loop is drastically inefficient. It takes numpy approx 6 seconds, but cupy takes 26 seconds. I understand why but it's not immediately clear to me how to parallelize this process.

我知道我需要编写某种归约内核，但是我无法考虑如何根据对另一个数组元素的迭代操作来构造一个cupy数组.

I know I'm going to need to write a reduction kernel of some sort, but I can't think of how to construct one cupy array from iterative operations on elements of another array.

推荐答案

在A100 GPU中使用广播CuPy花费0.10秒，而在NumPy中花费6.6秒

Using broadcasting CuPy takes 0.10 seconds in a A100 GPU compared to NumPy which takes 6.6 seconds

    for i in range(700):
        distance[i,:] = np.abs(np.broadcast_to(X[i,:], X.shape) - X).sum(axis=1)

此向量化使一个向量与所有其他向量的距离平行.

This vectorizes and makes the distance of one vector to all other ones in parallel.

这篇关于使用Cupy从GPU上的另一个矩阵创建距离矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！