而不将其传递到CPU

而不将其传递到CPU

本文介绍了在CUDA中查找最大/最小值,而不将其传递到CPU的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到浮点数组中最大元素的索引。我使用的函数cublasIsamax,但这返回索引到CPU,这是减慢了应用程序的运行时间。

I need to find the index of the maximum element in an array of floats. I am using the function "cublasIsamax", but this returns the index to the CPU, and this is slowing down the running time of the application.

有没有办法有效地计算此索引并将其存储在GPU中?

Is there a way to compute this index efficiently and store it in the GPU?

谢谢! p>

Thanks!

推荐答案

由于引入了CUBLAS V2 API(使用CUDA 4.0,IIRC),所以可能有返回标量或索引将这些直接存储在设备存储器中的变量中,而不是存储到主机变量(其需要设备主机传输,并且可能将结果留在错误的存储器空间中)。

Since the CUBLAS V2 API was introduced (with CUDA 4.0, IIRC), it is possible to have routines which return a scalar or index to store those directly into a variable in device memory, rather than into a host variable (which entails a device to host transfer and might leave the result in the wrong memory space).

要使用此功能,您需要使用调用,通过使用 CUBLAS_POINTER_MODE_DEVICE 来告诉CUBLAS上下文要求标量参数的指针是 device 模式。这意味着在一个像

To use this, you need to use the cublasSetPointerMode call to tell the CUBLAS context to expect pointers for scalar arguments to be device pointers by using the CUBLAS_POINTER_MODE_DEVICE mode. This then implies that in a call like

cublasStatus_t cublasIsamax(cublasHandle_t handle, int n,
                            const float *x, int incx, int *result)

result 必须是设备指针。

这篇关于在CUDA中查找最大/最小值,而不将其传递到CPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 06:36