问题描述
基本问题是CUDA编译器支持的? / p>
我在Ubuntu上使用CUDA 7.5和gcc-4.8。所有的模板类都在头文件中定义,并且在编译期间 #include
d到单个翻译单元中。
我有一个简单的 cuda_array
类,在 std :: vector
周围提供一个薄包装。它本质上是一个非常简单的版本 thrust :: host_vector
结合一个 thrust :: device_vector
。它的声明是
template< typename T,const size_t N&
class cuda_array {
std :: vector< T>主办;
T * device;
public:
//很多类型别名满足容器需求
void push(){/ * cudaMemcpy(...,H2D); * /}
void pull(){/ * cudaMemcpy(...,D2H); * /}
//这里不相关的其他几个
};
要创建矩阵,我只是创建了一个快速模板别名。
template< typename T,const size_t M,const size_t N>
using cuda_matrix = cuda_array< T,M * N> ;;
我想将我的矩阵向量乘法CUDA内核映射到重载的 *
对于类型安全和易于使用(调用者需要确保 push
和 pull
正确调用)。
模板< typename T,const size_t rows,const size_t cols>
__global__ void matrix_vector_mul(T * A,T * b,T * result){
__shared__ T shared_b [cols];
//其余的
}
模板< typename T,const size_t M,const size_t N>
__host__ cuda_array< T,M>运算符*(cuda_matrix & m,cuda_array & v){
cuda_array&结果;
matrix_vector_mul< T,M,N> >(m.device_data(),v.device_data(),result.device_data());
return result;
}
在我的'main.cpp'中,我有
cuda_matrix< int,16,32>一个;
cuda_array< int,32> b;
auto result = A * b;
最后一行引发错误:
错误:没有操作符*匹配这些操作数
操作数类型是:cuda_matrix< int,16UL,32UL> * cuda_array< int,32UL>
我搜索了所有常见的嫌疑犯模板类型扣除错误我可以想到,但没有工作。在绝望中,我将我的 cuda_matrix
别名模板转换为模板类。
template< typename T,const size_t M,const size_t N>
class cuda_matrix:public cuda_array< T,M * N> {};
编译错误消失!因此,似乎CUDA尚不支持别名模板。
您必须记住:
§14.5.7 [temp.alias] / p2:
这意味着不执行扣除:
template< typename T,const size_t M,const size_t N>
__host__ cuda_array< T,M> operator *(cuda_matrix< T,M,N>& m,cuda_array< T,N& v)
$ b b
但是:
template< typename T,const size_t M,const size_t N&
__host__ cuda_array< T,M> operator *(cuda_array< T,M * N>& m,cuda_array< T,N& v)
// ~~~~~~~~~~~~~ ^
因此:
.2.5 [temp.deduct.type] / p16:
M
是在不可推演的上下文中,因此这个
作为解决方法之一,您可以改为验证的推导值, cuda_array
本身:
template< typename T,std :: size_t MN,std :: size_t N>
auto operator *(const cuda_array< T,MN>& m,const cuda_array< T,N& v)
-> typename std :: enable_if<(MN / N)* N == MN,cuda_array< T,MN / N>
或使用你已经有的继承技巧;那么 M
和 N
是 cuda_matrix $ c $的单独的非类型模板参数c>。
The essential question is are alias templates supported by the CUDA compiler?
I am using CUDA 7.5 on Ubuntu with gcc-4.8. All of my template classes are defined in header files and #include
d into a single translation unit during compilation.
I have a simple cuda_array
class that provides a thin wrapper around a std::vector
. It's essentially a very simple version of thrust::host_vector
combined with a thrust::device_vector
. Its declaration is
template <typename T, const size_t N>
class cuda_array {
std::vector<T> host;
T *device;
public:
// lots of type aliases to meet container requirements
void push() { /* cudaMemcpy(...,H2D); */ }
void pull() { /* cudaMemcpy(...,D2H); */ }
// a few others that aren't relevant here
};
To make a matrix, I just made a quick template alias.
template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;
I want to map my matrix-vector multiplication CUDA kernel onto the overloaded operator*
for type safety and easy use (it is left to the caller to ensure that push
and pull
are called correctly).
template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
__shared__ T shared_b[cols];
// rest of it
}
template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
cuda_array<T, M> result;
matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
return result;
}
In my 'main.cpp', I then have
cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;
The last line throws an error saying
error: no operator "*" matches these operands
operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>
I chased down all of the usual suspects for template type deduction errors I could think of, but nothing worked. In desperation, I converted my cuda_matrix
alias template into a template class.
template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};
And the compile error disappears! It therefore seems that CUDA does not yet support alias templates. Or did I do something silly that I can't figure out?
You must remember that:
§ 14.5.7 [temp.alias]/p2:
This means that deduction is not performed for:
template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)
but for:
template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
// ~~~~~~~~~~~~~~~~~~~^
And so:
§ 14.8.2.5 [temp.deduct.type]/p16:
M
is in a non-deducible context, hence this operator*
is not considered as a viable overload.
As one of the workarounds, you can instead verify the deduced value for cuda_array
itself:
template <typename T, std::size_t MN, std::size_t N>
auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
-> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;
or use the inheritance trick that you already have; then M
and N
are separate non-type template parameters of cuda_matrix
.
这篇关于CUDA中的C ++ 11别名模板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!