问题描述
我正在尝试将atomicAdd函数作为模板参数传递给另一个函数。
I'm trying to passing atomicAdd function into another function as template parameter.
这是我的内核1:
template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
atomicFunc(address, 1);
}
尝试1:
myfunc1<<<1,1>>>(val.dev_ptr, atomicAdd);
它不起作用,因为编译器无法匹配预期的函数签名。
It does not work due to the compiler cannot match the expected function signature.
尝试2:
首先,我将atomicAdd包装到一个名为MyAtomicAdd的自定义函数中。
Try 2:Firstly, I wrap the atomicAdd into a custom function called MyAtomicAdd.
template<typename T>
__device__ void MyAtomicAdd(T *address, T val) {
atomicAdd(address, val);
}
然后,我定义了一个名为 TAtomic的函数指针,并将TAtomic声明为
Then, I defined a function pointer called "TAtomic" and declare the TAtomic as template parameter.
typedef void (*TAtomic)(float *,float);
template<typename T, TAtomic atomicFunc>
__global__ void myfunc2(T *address) {
atomicFunc(address, 1);
}
myfunc2<float, MyAtomicAdd><<<1,1>>>(dev_ptr);
CUDA_CHECK(cudaDeviceSynchronize());
实际上,尝试2种方法。但是,我不想使用typedef。我需要更通用的东西。
Actually, Try 2 works. But, I don't want to use typedef. I need something more generic.
尝试3:
只需将MyAtomicAdd传递给myfunc1。
Try 3:Just passing MyAtomicAdd to myfunc1.
myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<float>);
CUDA_CHECK(cudaDeviceSynchronize());
编译器可以编译代码。但是当我运行程序时,报告了错误:
"ERROR in /home/liang/groute-dev/samples/framework/pagerank.cu:70: invalid program counter (76)"
我只是想知道,为什么尝试3无效?是否存在任何简单或温和的方式来实现此要求?谢谢。
I just wondering, why try 3 doesn't work? And any simple or gentle way exists to implement this requirement? Thank you.
推荐答案
尝试3无效,因为您尝试获取的地址主机代码中的__device __
函数,在CUDA中是非法的:
Try 3 doesn't work because you are attempting to take the address of a __device__
function in host code, which is illegal in CUDA:
myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<float>);
^
effectively a function pointer - address of a __device__ function
这种用法CUDA中的尝试将解析为某种地址-但这是垃圾,因此当您尝试将其用作设备代码中的实际功能入口点时,会遇到以下错误: invalid程序计数器
(或者在某些情况下,只是非法地址
)。
Such usage attempts in CUDA will resolve to some sort of an "address" - but it is garbage, so when you try to use it as an actual function entry point in device code, you get the error you encountered: invalid program counter
(or in some cases, just illegal address
).
您可以通过将内在函数包装在函子中,而不要使用裸露的 __ device __
函数来使Try 3方法工作(无需 typedef
) :
You can make your Try 3 method work (without a typedef
) by wrapping the intrinsic in a functor instead of a bare __device__
function:
$ cat t48.cu
#include <stdio.h>
template<typename T>
__device__ void MyAtomicAdd(T *address, T val) {
atomicAdd(address, val);
}
template <typename T>
struct myatomicadd
{
__device__ T operator()(T *addr, T val){
return atomicAdd(addr, val);
}
};
template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
atomicFunc(address, (T)1);
}
int main(){
int *dev_ptr;
cudaMalloc(&dev_ptr, sizeof(int));
cudaMemset(dev_ptr, 0, sizeof(int));
// myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<int>);
myfunc1<<<1,1>>>(dev_ptr, myatomicadd<int>());
int h = 0;
cudaMemcpy(&h, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);
printf("h = %d\n", h);
return 0;
}
$ nvcc -arch=sm_35 -o t48 t48.cu
$ cuda-memcheck ./t48
========= CUDA-MEMCHECK
h = 1
========= ERROR SUMMARY: 0 errors
$
我们也可以实现一个稍微简单的版本,让我们可以从内核模板类型推断出函子模板类型:
We can realize a slightly simpler version of this as well, letting the functor template type be inferred from the kernel template type:
$ cat t48.cu
#include <stdio.h>
struct myatomicadd
{
template <typename T>
__device__ T operator()(T *addr, T val){
return atomicAdd(addr, val);
}
};
template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
atomicFunc(address, (T)1);
}
int main(){
int *dev_ptr;
cudaMalloc(&dev_ptr, sizeof(int));
cudaMemset(dev_ptr, 0, sizeof(int));
myfunc1<<<1,1>>>(dev_ptr, myatomicadd());
int h = 0;
cudaMemcpy(&h, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);
printf("h = %d\n", h);
float *dev_ptrf;
cudaMalloc(&dev_ptrf, sizeof(float));
cudaMemset(dev_ptrf, 0, sizeof(float));
myfunc1<<<1,1>>>(dev_ptrf, myatomicadd());
float hf = 0;
cudaMemcpy(&hf, dev_ptrf, sizeof(float), cudaMemcpyDeviceToHost);
printf("hf = %f\n", hf);
return 0;
}
$ nvcc -arch=sm_35 -o t48 t48.cu
$ cuda-memcheck ./t48
========= CUDA-MEMCHECK
h = 1
hf = 1.000000
========= ERROR SUMMARY: 0 errors
$
在CUDA中使用设备功能指针的更多方法链接到。
More treatments of the use of device function pointers in CUDA are linked to this answer.
这篇关于将内在函数作为模板参数传递的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!