本文介绍了CUDA中的C ++ 11别名模板的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本问题是CUDA编译器支持的? / p>

我在Ubuntu上使用CUDA 7.5和gcc-4.8。所有的模板类都在头文件中定义,并且在编译期间 #include d到单个翻译单元中。



我有一个简单的 cuda_array 类,在 std :: vector 周围提供一个薄包装。它本质上是一个非常简单的版本 thrust :: host_vector 结合一个 thrust :: device_vector 。它的声明是

  template< typename T,const size_t N& 
class cuda_array {
std :: vector< T>主办;
T * device;
public:
//很多类型别名满足容器需求
void push(){/ * cudaMemcpy(...,H2D); * /}
void pull(){/ * cudaMemcpy(...,D2H); * /}
//这里不相关的其他几个
};

要创建矩阵,我只是创建了一个快速模板别名。

  template< typename T,const size_t M,const size_t N> 
using cuda_matrix = cuda_array< T,M * N> ;;

我想将我的矩阵向量乘法CUDA内核映射到重载的 * 对于类型安全和易于使用(调用者需要确保 push pull 正确调用)。

 模板< typename T,const size_t rows,const size_t cols> 
__global__ void matrix_vector_mul(T * A,T * b,T * result){
__shared__ T shared_b [cols];
//其余的
}

模板< typename T,const size_t M,const size_t N>
__host__ cuda_array< T,M>运算符*(cuda_matrix & m,cuda_array & v){
cuda_array&结果;
matrix_vector_mul< T,M,N> >(m.device_data(),v.device_data(),result.device_data());
return result;
}



在我的'main.cpp'中,我有

  cuda_matrix< int,16,32>一个; 
cuda_array< int,32> b;
auto result = A * b;

最后一行引发错误:

 错误:没有操作符*匹配这些操作数
操作数类型是:cuda_matrix< int,16UL,32UL> * cuda_array< int,32UL>

我搜索了所有常见的嫌疑犯模板类型扣除错误我可以想到,但没有工作。在绝望中,我将我的 cuda_matrix 别名模板转换为模板类。

  template< typename T,const size_t M,const size_t N> 
class cuda_matrix:public cuda_array< T,M * N> {};

编译错误消失!因此,似乎CUDA尚不支持别名模板。

解决方案

您必须记住:



§14.5.7 [temp.alias] / p2:

这意味着不执行扣除:

  template< typename T,const size_t M,const size_t N> 
__host__ cuda_array< T,M> operator *(cuda_matrix< T,M,N>& m,cuda_array< T,N& v)


$ b b

但是:

  template< typename T,const size_t M,const size_t N& 
__host__ cuda_array< T,M> operator *(cuda_array< T,M * N>& m,cuda_array< T,N& v)
// ~~~~~~~~~~~~~ ^

因此:



.2.5 [temp.deduct.type] / p16:

M 是在不可推演的上下文中,因此这个



作为解决方法之一,您可以改为验证的推导值, cuda_array 本身:

  template< typename T,std :: size_t MN,std :: size_t N> 
auto operator *(const cuda_array< T,MN>& m,const cuda_array< T,N& v)
-> typename std :: enable_if<(MN / N)* N == MN,cuda_array< T,MN / N>

或使用你已经有的继承技巧;那么 M N cuda_matrix


The essential question is are alias templates supported by the CUDA compiler?

I am using CUDA 7.5 on Ubuntu with gcc-4.8. All of my template classes are defined in header files and #included into a single translation unit during compilation.

I have a simple cuda_array class that provides a thin wrapper around a std::vector. It's essentially a very simple version of thrust::host_vector combined with a thrust::device_vector. Its declaration is

template <typename T, const size_t N>
class cuda_array {
    std::vector<T> host;
    T *device;
public:
    // lots of type aliases to meet container requirements
    void push() { /* cudaMemcpy(...,H2D); */ }
    void pull() { /* cudaMemcpy(...,D2H); */ }
    // a few others that aren't relevant here
};

To make a matrix, I just made a quick template alias.

template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;

I want to map my matrix-vector multiplication CUDA kernel onto the overloaded operator* for type safety and easy use (it is left to the caller to ensure that push and pull are called correctly).

template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
     __shared__ T shared_b[cols];
    // rest of it
}

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
    cuda_array<T, M> result;
    matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
    return result;
}

In my 'main.cpp', I then have

cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;

The last line throws an error saying

error: no operator "*" matches these operands
        operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>

I chased down all of the usual suspects for template type deduction errors I could think of, but nothing worked. In desperation, I converted my cuda_matrix alias template into a template class.

template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};

And the compile error disappears! It therefore seems that CUDA does not yet support alias templates. Or did I do something silly that I can't figure out?

解决方案

You must remember that:

§ 14.5.7 [temp.alias]/p2:

This means that deduction is not performed for:

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)

but for:

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
//                                  ~~~~~~~~~~~~~~~~~~~^

And so:

§ 14.8.2.5 [temp.deduct.type]/p16:

M is in a non-deducible context, hence this operator* is not considered as a viable overload.

As one of the workarounds, you can instead verify the deduced value for cuda_array itself:

template <typename T, std::size_t MN, std::size_t N>
auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
    -> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;

or use the inheritance trick that you already have; then M and N are separate non-type template parameters of cuda_matrix.

这篇关于CUDA中的C ++ 11别名模板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 16:13