问题描述
我遇到了问题,为什么我的函数在使用CUBLAS的双精度范围内找到最大值和最小值的函数无法正常工作。
I'm having problems grasping why my function that finds maximum and minimum in a range of doubles using CUBLAS doesn't work properly.
代码如下:
Anyone with a golly good answer to my problem? I am a tad sad at the moment :(
推荐答案
cublasIdamax 和 cublasIdamin 调用是错误的。BLAS 1级调用中的 incx 所以我怀疑你想要更像的东西:
One of your arguments to both the cublasIdamax and cublasIdamin calls are wrong. The incx argument in BLAS level 1 calls should always be the stride of the input in words, not bytes. So I suspect that you want something more like:
stat = cublasIdamax(handle, n, d_values, 1, max_idx); if (stat != CUBLAS_STATUS_SUCCESS) printf("Max failed\n"); stat = cublasIdamin(handle, n, d_values, 1, min_idx); if (stat != CUBLAS_STATUS_SUCCESS) printf("min failed\n");通过使用 sizeof(double),你告诉例程使用8的跨度,我假定你在 d_values 中实际上有1的步幅。
By using sizeof(double) you are telling the routines to use a stride of 8, which will have the calls overrun the allocated storage of the input array and into uninitialised memory. I presume you actually have a stride of 1 in d_values.
编辑:这是一个完整的可运行示例,可以正常工作。注意我把代码切换到单精度,因为我目前不能访问双精度的硬件:
Here is a complete runnable example which works correctly. Note I switched the code to single precision because I don't presently have access to double precision capable hardware:
#include <cuda_runtime.h> #include <cublas_v2.h> #include <cstdio> #include <cstdlib> #include <sys/time.h> typedef float Real; void findMaxAndMinGPU(Real* values, int* max_idx, int* min_idx, int n) { Real* d_values; cublasHandle_t handle; cublasStatus_t stat; cudaMalloc((void**) &d_values, sizeof(Real) * n); cudaMemcpy(d_values, values, sizeof(Real) * n, cudaMemcpyHostToDevice); cublasCreate(&handle); stat = cublasIsamax(handle, n, d_values, 1, max_idx); if (stat != CUBLAS_STATUS_SUCCESS) printf("Max failed\n"); stat = cublasIsamin(handle, n, d_values, 1, min_idx); if (stat != CUBLAS_STATUS_SUCCESS) printf("min failed\n"); cudaFree(d_values); cublasDestroy(handle); } int main(void) { const int vmax=1000, nvals=10000; float vals[nvals]; srand ( time(NULL) ); for(int j=0; j<nvals; j++) { vals[j] = float(rand() % vmax); } int minIdx, maxIdx; findMaxAndMinGPU(vals, &maxIdx, &minIdx, nvals); int cmin = 0, cmax=0; for(int i=1; i<nvals; i++) { cmin = (vals[i] < vals[cmin]) ? i : cmin; cmax = (vals[i] > vals[cmax]) ? i : cmax; } fprintf(stdout, "%d %d %d %d\n", minIdx, cmin, maxIdx, cmax); return 0; }在编译和运行时提供:
$ g++ -I/usr/local/cuda/include -L/usr/local/cuda/lib cublastest.cc -lcudart -lcublas $ ./a.out 273 272 85 84请注意,CUBLAS遵循FORTRAN惯例,使用1个索引,而不是零索引,这就是为什么CUBLAS和CPU版本之间的差异为1的原因。
note that CUBLAS follows the FORTRAN convention and uses 1 indexing, rather than zero indexing, which is why there is a difference of 1 between the CUBLAS and CPU versions.
这篇关于使用CUBLAS查找最大和最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!