CUDA中的C ++ 11别名模板

本文介绍了CUDA中的C ++ 11别名模板的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

基本问题是CUDA编译器支持的？ / p>

我在Ubuntu上使用CUDA 7.5和gcc-4.8。所有的模板类都在头文件中定义，并且在编译期间 #include d到单个翻译单元中。

我有一个简单的 cuda_array 类，在 std :: vector 周围提供一个薄包装。它本质上是一个非常简单的版本 thrust :: host_vector 结合一个 thrust :: device_vector 。它的声明是

  template< typename T，const size_t N& 
 class cuda_array {
 std :: vector< T>主办; 
 T * device; 
 public：
 //很多类型别名满足容器需求
 void push（）{/ * cudaMemcpy（...，H2D）; * /} 
 void pull（）{/ * cudaMemcpy（...，D2H）; * /} 
 //这里不相关的其他几个
};

要创建矩阵，我只是创建了一个快速模板别名。

  template< typename T，const size_t M，const size_t N> 
 using cuda_matrix = cuda_array< T，M * N> ;;

我想将我的矩阵向量乘法CUDA内核映射到重载的 * 对于类型安全和易于使用（调用者需要确保 push 和 pull 正确调用）。

 模板< typename T，const size_t rows，const size_t cols> 
 __global__ void matrix_vector_mul（T * A，T * b，T * result）{
 __shared__ T shared_b [cols]; 
 //其余的
} 
 
模板< typename T，const size_t M，const size_t N> 
 __host__ cuda_array< T，M>运算符*（cuda_matrix & m，cuda_array & v）{
 cuda_array&结果; 
 matrix_vector_mul< T，M，N> >（m.device_data（），v.device_data（），result.device_data（））; 
 return result; 
}

在我的'main.cpp'中，我有

  cuda_matrix< int，16,32>一个; 
 cuda_array< int，32> b; 
 auto result = A * b;

最后一行引发错误：

 错误：没有操作符*匹配这些操作数
操作数类型是：cuda_matrix< int，16UL，32UL> * cuda_array< int，32UL>

我搜索了所有常见的嫌疑犯模板类型扣除错误我可以想到，但没有工作。在绝望中，我将我的 cuda_matrix 别名模板转换为模板类。

  template< typename T，const size_t M，const size_t N> 
 class cuda_matrix：public cuda_array< T，M * N> {};

编译错误消失！因此，似乎CUDA尚不支持别名模板。

解决方案

您必须记住：

 
 
 §14.5.7 [temp.alias] / p2：
这意味着不执行扣除：
  template< typename T，const size_t M，const size_t N> 
 __host__ cuda_array< T，M> operator *（cuda_matrix< T，M，N>& m，cuda_array< T，N& v）
  
 
 $ b b 
但是：
  template< typename T，const size_t M，const size_t N& 
 __host__ cuda_array< T，M> operator *（cuda_array< T，M * N>& m，cuda_array< T，N& v）
 // ~~~~~~~~~~~~~ ^ 
  
因此：
 
 
  .2.5 [temp.deduct.type] / p16：
  M 是在不可推演的上下文中，因此这个 
 
 
 作为解决方法之一，您可以改为验证的推导值， cuda_array 本身：
  template< typename T，std :: size_t MN，std :: size_t N> 
 auto operator *（const cuda_array< T，MN>& m，const cuda_array< T，N& v）
-> typename std :: enable_if<（MN / N）* N == MN，cuda_array< T，MN / N> 
  
或使用你已经有的继承技巧;那么 M 和 N 是 cuda_matrix 。
 
The essential question is are alias templates supported by the CUDA compiler?
I am using CUDA 7.5 on Ubuntu with gcc-4.8. All of my template classes are defined in header files and #included into a single translation unit during compilation.
I have a simple cuda_array class that provides a thin wrapper around a std::vector. It's essentially a very simple version of thrust::host_vector combined with a thrust::device_vector. Its declaration is
template <typename T, const size_t N>
class cuda_array {
    std::vector<T> host;
    T *device;
public:
    // lots of type aliases to meet container requirements
    void push() { /* cudaMemcpy(...,H2D); */ }
    void pull() { /* cudaMemcpy(...,D2H); */ }
    // a few others that aren't relevant here
};
To make a matrix, I just made a quick template alias.
template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;
I want to map my matrix-vector multiplication CUDA kernel onto the overloaded operator* for type safety and easy use (it is left to the caller to ensure that push and pull are called correctly).
template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
     __shared__ T shared_b[cols];
    // rest of it
}

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
    cuda_array<T, M> result;
    matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
    return result;
}
In my 'main.cpp', I then have
cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;
The last line throws an error saying
error: no operator "*" matches these operands
        operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>
I chased down all of the usual suspects for template type deduction errors I could think of, but nothing worked. In desperation, I converted my cuda_matrix alias template into a template class.
template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};
And the compile error disappears! It therefore seems that CUDA does not yet support alias templates. Or did I do something silly that I can't figure out?
 解决方案 
You must remember that:
§ 14.5.7 [temp.alias]/p2:
This means that deduction is not performed for:
template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)
but for:
template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
//                                  ~~~~~~~~~~~~~~~~~~~^
And so:
§ 14.8.2.5 [temp.deduct.type]/p16:
M is in a non-deducible context, hence this operator* is not considered as a viable overload.
As one of the workarounds, you can instead verify the deduced value for cuda_array itself:
template <typename T, std::size_t MN, std::size_t N>
auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
    -> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;
or use the inheritance trick that you  already have; then M and N are separate non-type template parameters of cuda_matrix.
                        这篇关于CUDA中的C ++ 11别名模板的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！