问题描述
问题或多或少说明了一切.
Question more or less says it all.
calling a host function("std::pow<int, int> ") from a __device__/__global__ function("_calc_psd") is not allowed
据我了解,这应该改为使用cuda pow函数,但事实并非如此.
from my understanding, this should be using the cuda pow function instead, but it isn't.
推荐答案
该错误与所报告的编译器完全相同.您不能在设备代码中使用主机功能,因为主机功能包括整个主机C ++ std库.CUDA包括自己的标准库,如编程指南中所述,但是您应该使用pow或fpow(取自C标准库,没有C ++或名称空间).nvcc将使用cuda正确的设备函数重载该函数,并内联结果代码.像下面这样的东西会起作用:
The error is exactly as the compiler is reported. You can't used host functions in device code, and that include the whole host C++ std library. CUDA includes its own standard library, described in the programming guide, but you should use either pow or fpow (taken from the C standard library, no C++ or namespaces). nvcc will overload the function with the cuda correct device function and inline the resulting code. Something like the following will work:
#include <math.h>
__device__ float func(float x) {
return x * x * fpow(x, 0.123456f);
}
我第一次错过的地方是错误中报告的模板说明符.您确定要向pow传递float或double参数吗?如果要传递整数,则CUDA标准库中没有重载函数,这就是为什么它可能会失败的原因.如果需要整数pow函数,则必须自己滚动(或进行强制转换,但是pow是一个相当昂贵的函数,我敢肯定,某些级联的整数乘法会更快).
The bit I missed the first time is the template specifier reported in the errors. Are you sure that you are passing either float or double arguments to pow? If you are passing integers, there is no overload function in the CUDA standard library, which is why it might be failing. If you need an integer pow function, you will have to roll your own (or do casting, but pow is a rather expensive function and I am certain some cascaded integer multiplication will be faster).
这篇关于PyCUDA:设备代码中的Pow尝试使用std :: pow失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!