问题描述
为了测试printf()在设备上的调用,我写了一个简单的程序,它将中等大小的数组复制到设备,并将设备数组的值打印到屏幕。虽然阵列被正确地复制到设备,printf()函数不能正常工作,这丢失了前几百个数字。代码中的数组大小是4096.这是一个错误还是我没有正确使用这个函数?感谢adavnce。
For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.
编辑:我的gpu是GeForce GTX 550i,具有计算能力2.1
My gpu is GeForce GTX 550i, with compute capability 2.1
#include<stdio.h>
#include<stdlib.h>
#define N 4096
__global__ void Printcell(float *d_Array , int n){
int k = 0;
printf("\n=========== data of d_Array on device==============\n");
for( k = 0; k < n; k++ ){
printf("%f ", d_Array[k]);
if((k+1)%6 == 0) printf("\n");
}
printf("\n\nTotally %d elements has been printed", k);
}
int main(){
int i =0;
float Array[N] = {0}, rArray[N] = {0};
float *d_Array;
for(i=0;i<N;i++)
Array[i] = i;
cudaMalloc((void**)&d_Array, N*sizeof(float));
cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
cudaDeviceSynchronize();
Printcell<<<1,1>>>(d_Array, N); //Print the device array by a kernel
cudaDeviceSynchronize();
/* Copy the device array back to host to see if it was correctly copied */
cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);
printf("\n\n");
for(i=0;i<N;i++){
printf("%f ", rArray[i]);
if((i+1)%6 == 0) printf("\n");
}
}
推荐答案
printf从设备有有限队列。它适用于小型调试样式输出,而不是大规模输出。
printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.
指向:
内核printf输出覆盖缓冲区,因此第一个打印的元素在缓冲区转储到标准I / O队列之前丢失(覆盖)。
Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.
这篇关于设备上调用的printf()输出不完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!