cuda - cublasSetVector()与cudaMemcpy()

我想知道两者之间是否有区别:

// cumalloc.c - Create a device on the device
HOST float * cudamath_vector(const float * h_vector, const int m)
{
  float *d_vector = NULL;
  cudaError_t cudaStatus;
  cublasStatus_t cublasStatus;

  cudaStatus = cudaMalloc(&d_vector, sizeof(float) * m );

  if(cudaStatus == cudaErrorMemoryAllocation) {
    printf("ERROR: cumalloc.cu, cudamath_vector() : cudaErrorMemoryAllocation");
    return NULL;
  }


  /*    THIS: */ cublasSetVector(m, sizeof(*d_vector), h_vector, 1, d_vector, 1);

  /* OR THAT: */ cudaMemcpy(d_vector, h_vector, sizeof(float) * m, cudaMemcpyHostToDevice);


  return d_vector;
}

cublasSetVector()有两个参数incx和incy以及documentation says:

在NVIDIA forum中有人说:

那么这是否意味着对于incx = incy = 1，float[]的所有元素都将被sizeof(float)对齐，而对于incx = incy = 2，每个元素之间将存在一个sizeof(float) -padding？

除了这两个参数和cublasHandle以外-cublasSetVector()是否还有其他cudaMalloc()不起作用的功能？

是否可以将使用各自的cublas*()函数创建的而不是的矢量/矩阵传递给其他CUBLAS函数来进行操作？

最佳答案

Massimiliano Fatica提供的thread of the NVIDIA Forum中有一条评论，确认了我在上述评论中的发言(或者，更好的是，我的评论源于回想起我所链接的帖子)。特别是

因此，您可以安全地将cudaMalloc创建的任何数组作为输入传递给cublasSetVector。

关于跨步，指南(从CUDA 6.0版本开始)可能印有错误，其中指出:

但也许应该读为