问题描述
我无法获取浮动纹理。纹理定义如下:
texture< float,2,cudaReadModeElementType> cornerTexture;
绑定和参数设置为:
cornerTexture.addressMode [0] = cudaAddressModeClamp;
cornerTexture.addressMode [1] = cudaAddressModeClamp;
cornerTexture.filterMode = cudaFilterModePoint;
cornerTexture.normalized = false;
cudaChannelFormatDesc cornerDescription = cudaCreateChannelDesc< float>();
cudaBindTexture2D(0,& cornerTexture,cornerImage-> imageData_device,& cornerDescription,cornerImage-> width,cornerImage-> height,cornerImage-> widthStep)
height
width 是以元素数量表示的两个维度的大小。 widthStep
是以字节数表示的。内核访问发生如下:
thisValue = tex2D(cornerTexture,thisPixel.x,thisPixel.y);
printf(thisPixel.x:%i thisPixel.y:%i thisValue:%f\\\
,thisPixel.x,thisPixel.y,thisValue);
thisValue
应始终为非负浮动。 printf()
是给我奇怪的,无用的值,不同于线性内存实际存储的值。我试图在两个坐标上用 0.5f
来抵消访问,但是它给出了相同的错误结果。
任何想法?
更新似乎存在隐藏的对齐要求。从我可以推断,传递到 cudaBindTexture
函数的音高需要是32个字节的倍数。例如,以下给出不正确的结果:
cudaBindTexture2D(0,& debugTexture,deviceFloats,& debugDescription,10, 40)当获取纹理时,
,但是以下(同一数组的宽度和高度切换)工作良好:
cudaBindTexture2D(0,& debugTexture,deviceFloats,& debugDescription,32,10,128)
我不知道我是否缺少某些东西, >
更新2:我已经提交了Nvidia的错误报告。有兴趣的人可以在他们的开发区查看,但我会把回复发回这里。
Nvidia回覆错误报告:
这里的问题是,绑定到2D纹理的存储器没有适当的对齐限制,纹理存储器的基本偏移和间距都具有某些取决于HW的对齐限制,然而,目前在CUDA API,我们只公开基本偏移限制作为设备属性,而不是音调限制。
音调限制将在未来的CUDA版本中解决,建议应用程序在分配匹配内存时使用 cudaMallocPitch()
,以便驱动程序能够满足所有限制。
I am having trouble fetching a texture of floats. The texture is defined as follows:
texture<float, 2, cudaReadModeElementType> cornerTexture;
The binding and parameter settings are:
cornerTexture.addressMode[0] = cudaAddressModeClamp;
cornerTexture.addressMode[1] = cudaAddressModeClamp;
cornerTexture.filterMode = cudaFilterModePoint;
cornerTexture.normalized = false;
cudaChannelFormatDesc cornerDescription = cudaCreateChannelDesc<float>();
cudaBindTexture2D(0, &cornerTexture, cornerImage->imageData_device, &cornerDescription, cornerImage->width, cornerImage->height, cornerImage->widthStep);
height
and width
are the sizes of the two dimensions in terms of numbers of elements. widthStep
is in terms of number of bytes. In-kernel access occurs as follows:
thisValue = tex2D(cornerTexture, thisPixel.x, thisPixel.y);
printf("thisPixel.x: %i thisPixel.y: %i thisValue: %f\n", thisPixel.x, thisPixel.y, thisValue);
thisValue
should always be a non-negative float. printf()
is giving me strange, useless values that are different from what the linear memory actually stores. I have tried offsetting the access with a 0.5f
on both coordinates, but it gives me the same wrong results.
Any ideas?
Update There seems to be a hidden alignment requirement. From what I can deduce, the pitch passed to the cudaBindTexture
function needs to be a multiple of 32 bytes. For example, the following gives incorrect results
cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 10, 32, 40)
when fetching the texture, but the following (the same array with its width and height switched) works well:
cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 32, 10, 128)
I'm not sure whether I'm missing something or there really is a constraint on the pitch.
Update 2: I have filed a bug report with Nvidia. Those who are interested can view it in their developer zone, but I will post the reply back here.
Nvidia reply to bug report:
"The problem here is that the memory bound to the 2D texture does not have the proper alignment restrictions. Both the base offset of the texture memory, and the pitch, have certain HW dependant alignment restrictions. However, currently in the CUDA API, we only expose the base offset restriction as a device property, and not the pitch restriction.
The pitch restriction will be addressed in a future CUDA release. Meanwhile, it's recommended that apps use cudaMallocPitch()
when allocating pitched memory, so that the driver takes care of satisfying all restrictions."
这篇关于获取CUDA纹理问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!