问题描述
我想用一个FFT从double到std :: complex与CuFFT Lib。我的代码看起来像
i want to make a FFT from double to std::complex with the CuFFT Lib. My Code looks like
#include <complex>
#include <iostream>
#include <cufft.h>
#include <cuda_runtime_api.h>
typedef std::complex<double> Complex;
using namespace std;
int main(){
int n = 100;
double* in;
Complex* out;
in = (double*) malloc(sizeof(double) * n);
out = (Complex*) malloc(sizeof(Complex) * n/2+1);
for(int i=0; i<n; i++){
in[i] = 1;
}
cufftHandle plan;
plan = cufftPlan1d(&plan, n, CUFFT_D2Z, 1);
unsigned int mem_size = sizeof(double)*n;
cufftDoubleReal *d_in;
cufftDoubleComplex *d_out;
cudaMalloc((void **)&d_in, mem_size);
cudaMalloc((void **)&d_out, mem_size);
cudaMemcpy(d_in, in, mem_size, cudaMemcpyHostToDevice);
cudaMemcpy(d_out, out, mem_size, cudaMemcpyHostToDevice);
int succes = cufftExecD2Z(plan,(cufftDoubleReal *) d_in,(cufftDoubleComplex *) d_out);
cout << succes << endl;
cudaMemcpy(out, d_out, mem_size, cudaMemcpyDeviceToHost);
for(int i=0; i<n/2; i++){
cout << "out: " << i << " " << out[i].real() << " " << out[i].imag() << endl;
}
return 0;
}
但是在我看来,这一定是错误的,因为我认为转换值应该是1 0 0 0 0 ....或没有标准化100 0 0 0 0 ....但我只是得到0 0 0 0 0 ...
but it seems to me this must be wrong, because i think the transformed values should be 1 0 0 0 0 .... or without the normalization 100 0 0 0 0 .... but i just get 0 0 0 0 0 ...
此外,我想更多如果cufftExecD2Z将工作到位,这应该是可能的,但我还没有弄清楚如何正确这样做。有人可以帮助吗?
Furthermore i would like it more if the cufftExecD2Z would work in place, which should be possible but i haven't figured out how to correctly do so. Can anybody help?
推荐答案
您的代码有各种错误。您应该查看以及示例代码。
Your code has a variety of errors. You should probably review cufft documentation as well as the sample codes.
- 您应该对所有API返回值进行正确的cuda错误检查和适当的cufft错误检查。
-
cufftPlan1d
函数的返回值不在计划中:
- You should do proper cuda error checking and proper cufft error checking on all API return values.
The return value of the
cufftPlan1d
function does not go into the plan:
plan = cufftPlan1d(&plan, n, CUFFT_D2Z, 1);
函数本身设置计划(这就是为什么你通过& plan
到函数),那么当你将返回值赋给计划时,它会破坏函数设置的计划。
The function itself sets the plan (that is why you pass &plan
to the function), then when you assign the return value into the plan, it ruins the plan set up by the function.
你正确地识别出输出可以是大小((N / 2)+1)
,但是你没有在主机端正确分配空间:
You correctly identified that the output can be of size ((N/2)+1)
, but then you didn't allocate space for it properly either on the host side:
out = (Complex*) malloc(sizeof(Complex) * n/2+1);
或在设备端:
unsigned int mem_size = sizeof(double)*n;
...
cudaMalloc((void **)&d_out, mem_size);
以下代码具有一些上述问题固定,足以获得所需的结果(100,0,0,...)
The following code has some of the above problems fixed, enough to get your desired result (100, 0, 0, ...)
#include <complex>
#include <iostream>
#include <cufft.h>
#include <cuda_runtime_api.h>
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
typedef std::complex<double> Complex;
using namespace std;
int main(){
int n = 100;
double* in;
Complex* out;
#ifdef IN_PLACE
in = (double*) malloc(sizeof(Complex) * (n/2+1));
out = (Complex*)in;
#else
in = (double*) malloc(sizeof(double) * n);
out = (Complex*) malloc(sizeof(Complex) * (n/2+1));
#endif
for(int i=0; i<n; i++){
in[i] = 1;
}
cufftHandle plan;
cufftResult res = cufftPlan1d(&plan, n, CUFFT_D2Z, 1);
if (res != CUFFT_SUCCESS) {cout << "cufft plan error: " << res << endl; return 1;}
cufftDoubleReal *d_in;
cufftDoubleComplex *d_out;
unsigned int out_mem_size = (n/2 + 1)*sizeof(cufftDoubleComplex);
#ifdef IN_PLACE
unsigned int in_mem_size = out_mem_size;
cudaMalloc((void **)&d_in, in_mem_size);
d_out = (cufftDoubleComplex *)d_in;
#else
unsigned int in_mem_size = sizeof(cufftDoubleReal)*n;
cudaMalloc((void **)&d_in, in_mem_size);
cudaMalloc((void **)&d_out, out_mem_size);
#endif
cudaCheckErrors("cuda malloc fail");
cudaMemcpy(d_in, in, in_mem_size, cudaMemcpyHostToDevice);
cudaCheckErrors("cuda memcpy H2D fail");
res = cufftExecD2Z(plan,d_in, d_out);
if (res != CUFFT_SUCCESS) {cout << "cufft exec error: " << res << endl; return 1;}
cudaMemcpy(out, d_out, out_mem_size, cudaMemcpyDeviceToHost);
cudaCheckErrors("cuda memcpy D2H fail");
for(int i=0; i<n/2; i++){
cout << "out: " << i << " " << out[i].real() << " " << out[i].imag() << endl;
}
return 0;
}
查看。上述代码可以用 -DIN_PLACE
重新编译,以查看in-place变换的行为,并修改必要的代码。
Review the documentation on what is necessary to do an in-place transform in the real to complex case. The above code can be recompiled with -DIN_PLACE
to see the behavior for an in-place transform, and the necessary code changes.
这篇关于CuFFT双至复合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!