问题描述
我需要计算复数的两个向量(Hadamard乘积)与NVidia CUBLAS的元素方式乘法。不幸的是,CUBLAS中没有HAD操作。显然,您可以使用SBMV操作执行此操作,但它不能实现CUBLAS中的复数。我不能相信有没有办法实现这个与CUBLAS。有没有其他方法来实现CUBLAS,对于复数?
I need the compute the element wise multiplication of two vectors (Hadamard product) of complex numbers with NVidia CUBLAS. Unfortunately, there is no HAD operation in CUBLAS. Apparently, you can do this with the SBMV operation, but it is not implemented for complex numbers in CUBLAS. I cannot believe there is no way to achieve this with CUBLAS. Is there any other way to achieve that with CUBLAS, for complex numbers ?
我不能编写自己的内核,我必须使用CUBLAS(或者另一个标准的NVIDIA库,如果它真的不可能与CUBLAS)。
I cannot write my own kernel, I have to use CUBLAS (or another standard NVIDIA library if it is really not possible with CUBLAS).
推荐答案
CUBLAS基于参考BLAS,从未包含Hadamard产品(复杂或真实)。因此CUBLAS也没有。英特尔已添加到MKL来做到这一点,但它是非标准的,而不是在大多数BLAS实现。这是一种操作,一个老学校fortran程序员只是写一个循环,所以我认为它真的不需要在BLAS中的一个专门的例程。
CUBLAS is based on the reference BLAS, and the reference BLAS has never contained a Hadamard product (complex or real). Hence CUBLAS doesn't have one either. Intel have added v?Mul
to MKL for doing this, but it is non-standard and not in most BLAS implementations. It is the kind of operation that an old school fortran programmer would just write a loop for, so I presume it really didn't warrant a dedicated routine in BLAS.
是没有标准CUDA库我知道哪个实现了Hadamard产品。将有可能使用CUBLAS GEMM或SYMM来执行此操作,并提取结果矩阵的对角线,但是从计算和存储的角度来看,这将是非常低效的。
There is no "standard" CUDA library I am aware of which implements a Hadamard product. There would be the possibility of using CUBLAS GEMM or SYMM to do this and extracting the diagonal of the resulting matrix, but that would be horribly inefficient, both from a computation and storage stand point.
Thrust模板库可以使用,例如:
The Thrust template library can do this trivially using thrust::transform
, for example:
thrust::multiplies<thrust::complex<float> > op;
thrust::transform(thrust::device, x, x + n, y, z, op);
将迭代来自设备指针x和y的每对输入,并计算z [i] = x [i] * y [i](可能有一些演员你需要编译,但你得到的想法)。但是,这有效地需要在您的项目中编译CUDA代码,显然您不想要。
would iterate over each pair of inputs from the device pointers x and y and calculate z[i] = x[i] * y[i] (there is probably a couple of casts you need to make to compile that, but you get the idea). But that effectively requires compilation of CUDA code within your project, and apparently you don't want that.
这篇关于如何使用CUBLAS在复数上执行Hadamard产品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!