问题描述
我有以下简单的CUDA-Thrust代码,该代码将10添加到设备向量中,但是该函数在主机端而不是设备上被调用。
I have following simple CUDA-Thrust code which adds 10 to device vector but the function is getting called on host side instead of device.
#include <algorithm>
#include <iostream>
#include <numeric>
#include <vector>
#include <stdio.h>
#include <thrust/device_vector.h>
__host__ __device__ int add(int x){
#if defined(__CUDA_ARCH__)
printf("In device\n");
#else
printf("In host\n");
#endif
return x+10;
}
int main(void)
{
thrust::host_vector<int> H(4);
H[0] = H[1] = H[2] = H[3] = 10;
thrust::device_vector<int> data=H;
std::transform(data.begin(), data.end(), data.begin(),add);
return 0;
}
我在这里做什么错了?
推荐答案
提供了很好的例子。
您似乎遇到了几个问题,其中一些已经指出。
It looks like you have several issues, some already pointed out.
-
如果要使用推力,应使用
thrust :: transform
而不是std :: transform
。std :: transform
不了解GPU或CUDA或推力,并将分派add
的主机版本功能。我不确定当您将thrust :: device_vector
传递给它时究竟会做什么。
If you want to use thrust, you should use
thrust::transform
, notstd::transform
.std::transform
has no knowledge of the GPU or CUDA or thrust, and will dispatch the host version of youradd
function. I'm not sure what that would do exactly when you pass athrust::device_vector
to it.
推力算法需要使用函数对象(函子),而不要使用裸露的CUDA __ device __
函数,原因是Jared指出的原因(源代码中的推力算法实际上是主机代码该主机代码无法发现裸 __ device __
函数的地址)。借助此修复程序,您可以确定在处理设备向量时推力将调度设备代码路径。
Thrust algorithms need to use function objects (functors) rather than bare CUDA __device__
functions, for the reason indicated by Jared (the thrust algorithm in your source code is actually host code. That host code cannot discover the address of a bare __device__
function). With this fix, you can be pretty certain that thrust will dispatch the device code path when working on device vectors.
这是您的代码的修改:
$ cat t856.cu
#include <stdio.h>
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/transform.h>
struct my_func {
__host__ __device__
int operator()(int x){
#if defined(__CUDA_ARCH__)
printf("In device, x is %d\n", x);
#else
printf("In host, x is %d\n", x);
#endif
return x+10;
}
};
int main(void)
{
thrust::host_vector<int> H(4);
H[0] = H[1] = H[2] = H[3] = 10;
thrust::device_vector<int> data=H;
thrust::transform(data.begin(), data.end(), data.begin(),my_func());
return 0;
}
$ nvcc -o t856 t856.cu
$ ./t856
In device, x is 10
In device, x is 10
In device, x is 10
In device, x is 10
$
这篇关于推力不调用设备功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!