问题描述
我需要使用 NVidia CUBLAS 计算复数的两个向量(Hadamard 乘积)的元素乘法.不幸的是,CUBLAS 中没有 HAD 操作.显然,您可以使用 SBMV 操作来做到这一点,但在 CUBLAS 中它没有实现复数.我不敢相信 CUBLAS 无法实现这一目标.对于复数,还有其他方法可以使用 CUBLAS 实现吗?
I need the compute the element wise multiplication of two vectors (Hadamard product) of complex numbers with NVidia CUBLAS. Unfortunately, there is no HAD operation in CUBLAS. Apparently, you can do this with the SBMV operation, but it is not implemented for complex numbers in CUBLAS. I cannot believe there is no way to achieve this with CUBLAS. Is there any other way to achieve that with CUBLAS, for complex numbers ?
我无法编写自己的内核,我必须使用 CUBLAS(或其他标准 NVIDIA 库,如果 CUBLAS 确实无法实现).
I cannot write my own kernel, I have to use CUBLAS (or another standard NVIDIA library if it is really not possible with CUBLAS).
推荐答案
CUBLAS是基于参考BLAS,参考 BLAS 从未包含 Hadamard 产品(复杂的或真实的).因此 CUBLAS 也没有.英特尔已将 v?Mul
添加到 MKL这个,但它是非标准的,并且不在大多数 BLAS 实现中.这是一种老派的 fortran 程序员只会为其编写循环的操作,所以我认为它确实不需要在 BLAS 中使用专门的例程.
CUBLAS is based on the reference BLAS, and the reference BLAS has never contained a Hadamard product (complex or real). Hence CUBLAS doesn't have one either. Intel have added v?Mul
to MKL for doing this, but it is non-standard and not in most BLAS implementations. It is the kind of operation that an old school fortran programmer would just write a loop for, so I presume it really didn't warrant a dedicated routine in BLAS.
据我所知,没有标准"CUDA 库实现了 Hadamard 产品.有可能使用 CUBLAS GEMM 或 SYMM 来执行此操作并提取结果矩阵的对角线,但从计算和存储的角度来看,这将是非常低效的.
There is no "standard" CUDA library I am aware of which implements a Hadamard product. There would be the possibility of using CUBLAS GEMM or SYMM to do this and extracting the diagonal of the resulting matrix, but that would be horribly inefficient, both from a computation and storage stand point.
Thrust 模板库可以使用 thrust::transform代码>
,例如:
The Thrust template library can do this trivially using thrust::transform
, for example:
thrust::multiplies<thrust::complex<float> > op;
thrust::transform(thrust::device, x, x + n, y, z, op);
将遍历来自设备指针 x 和 y 的每对输入并计算 z[i] = x[i] * y[i] (您可能需要进行一些转换来编译它,但是你明白了).但这实际上需要在您的项目中编译 CUDA 代码,显然您不希望这样做.
would iterate over each pair of inputs from the device pointers x and y and calculate z[i] = x[i] * y[i] (there is probably a couple of casts you need to make to compile that, but you get the idea). But that effectively requires compilation of CUDA code within your project, and apparently you don't want that.
这篇关于如何使用 CUBLAS 对复数执行 Hadamard 乘积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!