本文介绍了如何读回CUDA纹理进行测试?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好了,到目前为止,我可以创建主机(类型为浮动)上的一​​组,并将其复制到GPU,然后把它带回主机作为另一个阵列(测试是否复制成功通过比较到原来的)。

然后我创建的GPU阵列的CUDA数组。然后我绑定该数组到CUDA质感。

我现在想读纹理回并与原来的阵列(再次测试,它正确地复制)进行比较。我看到,使用如下所示的readTexel()函数的一些示例code。它似乎并没有为我工作。(基本上一切正常,除了在bindToTexture(功能开始于readTexels(部分浮动* deviceArray)的大小,testArrayDevice)线)。

以不同的方式任何建议,做到这一点?还是有我在code错过了一些明显的问题?

感谢您的帮助家伙!

 的#include<&stdio.h中GT;
#包括LT&;&ASSERT.H GT;
#包括LT&;&cuda.​​h GT;的#define SIZE 20;//创建一个通道说明来使用。
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32,0,0,0,cudaChannelFormatKindFloat);//创建纹理使用。
质地LT;浮动,2,cudaReadModeElementType> cudaTexture;
//cudaTexture.filterMode = cudaFilterModeLinear;
//cudaTexture.normalized = FALSE;__global__无效readTexels(INT数量,浮动*阵列)
{
  INT指数= blockIdx.x * blockDim.x + threadIdx.x;  如果(指数<金额)
  {
    浮动X = tex1D(cudaTexture,浮点(索引));
    数组[索引] = X;
  }
}浮动* copyToGPU(浮点* hostArray,INT大小)
{
  //创建指针,一个用于该阵列是在装置上,和一个用于把它回主机进行测试。
  浮* deviceArray;
  浮* testArray;  //分配一些内存为两个数组,这样它们就不会被覆盖。
  testArray =(浮点*)malloc的(的sizeof(浮动)*大小);  //分配一些记忆被投入到GPU设备的阵列。
  cudaMalloc((无效**)及deviceArray,sizeof的(浮点)*大小);  //其实从阵列到hostArray复制deviceArray。
  cudaMemcpy(deviceArray,hostArray,sizeof的(浮点)*尺寸,cudaMemcpyHostToDevice);  //在主机内存复制deviceArray回testArray进行测试。
  cudaMemcpy(testArray,deviceArray,sizeof的(浮点)*尺寸,cudaMemcpyDeviceToHost);  //使testArray的确认内容hostArray匹配的原始内容。
  的for(int i = 0; I<大小;我++)
  {
    如果(hostArray [I]!= testArray [I])
    {
      的printf(位置[%D]。不匹配hostArray和testArray \\ n,I);
    }
  }  //不要忘记释放这些阵列大功告成了!
  免费(testArray);  返回deviceArray; // TODO:释放装置阵列VIA cudaFree(deviceArray);
}cudaArray * bindToTexture(浮点* deviceArray)
{
  //创建一个CUDA数组deviceArray翻译成。
  cudaArray * cuArray;  //分配内存CUDA数组。
  cudaMallocArray(安培; cuArray,&放大器; cudaTexture.channelDesc,尺寸,1);  //将deviceArray复制到CUDA数组。
  cudaMemcpyToArray(cuArray,0,0,deviceArray,sizeof的(浮动)*尺寸,cudaMemcpyHostToDevice);  //释放deviceArray
  cudaFree(deviceArray);  //绑定CUDA数组纹理。
  cudaBindTextureToArray(cudaTexture,cuArray);  //使设备与主机上的测试序列来验证纹理已被正确保存。
  浮* testArrayDevice;
  浮* testArrayHost;  //分配内存的两个测试阵列。
  cudaMalloc((无效**)及testArray,sizeof的(浮动)* SIZE);
  testArrayHost =(浮点*)malloc的(的sizeof(浮动)* SIZE);  //读取纹理的纹理像素,以在该装置的测试阵列。
  readTexels(SIZE,testArrayDevice);  //设备测试阵列复制到主机测试数组。
  cudaMemcpy(testArrayHost,testArrayDevice,sizeof的(浮动)*尺寸,cudaMemcpyDeviceToHost);  //数组绝版内容。
  的for(int i = 0; I<大小;我++)
  {
    的printf(%F \\ N,testArrayHost [I]);
  }  //释放内存的测试阵列。
  免费(testArrayHost);
  cudaFree(testArrayDevice);  返回cuArray; // TODO:取消绑定CUDA纹理通过cudaUnbindTexture(cudaTexture);
  // TODO:自由的CUDA数组VIA cudaFree(cuArray);
}
INT主要(无效)
{
  浮* hostArray;  hostArray =(浮点*)malloc的(的sizeof(浮动)* SIZE);  的for(int i = 0; I<大小;我++)
  {
    hostArray [I] = 10.f + I;
  }  浮动* deviceAddy = copyToGPU(hostArray,SIZE);  免费(hostArray);  返回0;
}


解决方案

简而言之:

-------------在你main.cu ----------------------------- -------------------------------------------------- --------

-1。定义纹理作为globlal变量

 
       质地refTexture; //全局变量!
       //意思是:解决与(X,Y)(2D)纹理,并得到一个unsinged INT

在主要功能:

-2。使用数组纹理相结合


    cudaArray * myArray的; // declar。
    //要求内存
    cudaMallocArray(安培; myArray的,结果
                        &安培; refTex.channelDesc,/ *这个你不需要填写一个信道描述符* /
                        宽度,结果
                        高度);

-3。从CPU的数据复制到GPU(到阵列)


 cudaMemcpyToArray(arrayCudaEntrada,//目的:数组结果
                        0,0,//偏移
                        sourceData,//指针UINT *
                        宽*高*的sizeof(UINT),//字节总量要复制
                        cudaMemcpyHostToDevice);

-4。结合质感和阵列
 


    cudaBindTextureToArray(refTex,arrayCudaEntrada)

-5。改变纹理一些参数

结果
   refTextura_In.normalized = FALSE; //不会自动获取的数据转换为[0,1 [
      refTextura_In.addressMode [0] = cudaAddressModeClamp; //如果我的索引超出范围:自动使用一个有效的索引(0,如果负指数,最后如果太大指数)
      refTextura_In.addressMode [1] = cudaAddressModeClamp;

----------在内核---------------------------------- ----------------------


    //找出指标(F,C),通过这个线程处理
     UINT F =(blockIdx.x * blockDim.x)+ threadIdx.x;
     UINT C =(blockIdx.y * blockDim.y)+ threadIdx.y;

  //这是奇怪的和必要的:指数从纹理读
  //是花车!即使你是一定的访问(4,5)你有
  //匹配中心,这是(4.5,5.5)
  UINT读= tex2D(refTex,C + 0.5F,F + 0.5F); // texRef的是一个全局变量

现在您处理读写结果到设备的全球其他区域
内存,不以本身的质感!

Ok, so far, I can create an array on the host computer (of type float), and copy it to the gpu, then bring it back to the host as another array (to test if the copy was successful by comparing to the original).

I then create a CUDA array from the array on the GPU. Then I bind that array to a CUDA texture.

I now want to read that texture back and compare with the original array (again to test that it copied correctly). I saw some sample code that uses the readTexel() function shown below. It doesn't seem to work for me... (basically everything works except for the section in the bindToTexture(float* deviceArray) function starting at the readTexels(SIZE, testArrayDevice) line).

Any suggestions of a different way to do this? Or are there some obvious problems I missed in my code?

Thanks for the help guys!

#include <stdio.h>
#include <assert.h>
#include <cuda.h>

#define SIZE 20;

//Create a channel description to use.
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);

//Create a texture to use.
texture<float, 2, cudaReadModeElementType> cudaTexture;
//cudaTexture.filterMode = cudaFilterModeLinear;
//cudaTexture.normalized = false;

__global__ void readTexels(int amount, float *Array)
{
  int index = blockIdx.x * blockDim.x + threadIdx.x;

  if (index < amount)
  {
    float x = tex1D(cudaTexture, float(index));
    Array[index] = x;
  }
}

float* copyToGPU(float* hostArray, int size)
{
  //Create pointers, one for the array to be on the device, and one for bringing it back to the host for testing.
  float* deviceArray;
  float* testArray;

  //Allocate some memory for the two arrays so they don't get overwritten.
  testArray = (float *)malloc(sizeof(float)*size);

  //Allocate some memory for the array to be put onto the GPU device.
  cudaMalloc((void **)&deviceArray, sizeof(float)*size);

  //Actually copy the array from hostArray to deviceArray.
  cudaMemcpy(deviceArray, hostArray, sizeof(float)*size, cudaMemcpyHostToDevice);

  //Copy the deviceArray back to testArray in host memory for testing.
  cudaMemcpy(testArray, deviceArray, sizeof(float)*size, cudaMemcpyDeviceToHost);

  //Make sure contents of testArray match the original contents in hostArray.
  for (int i = 0; i < size; i++)
  {
    if (hostArray[i] != testArray[i])
    {
      printf("Location [%d] does not match in hostArray and testArray.\n", i);
    }
  }

  //Don't forget free these arrays after you're done!
  free(testArray);

  return deviceArray; //TODO: FREE THE DEVICE ARRAY VIA cudaFree(deviceArray);
}

cudaArray* bindToTexture(float* deviceArray)
{
  //Create a CUDA array to translate deviceArray into.
  cudaArray* cuArray;

  //Allocate memory for the CUDA array.
  cudaMallocArray(&cuArray, &cudaTexture.channelDesc, SIZE, 1);

  //Copy the deviceArray into the CUDA array.
  cudaMemcpyToArray(cuArray, 0, 0, deviceArray, sizeof(float)*SIZE, cudaMemcpyHostToDevice);

  //Release the deviceArray
  cudaFree(deviceArray);

  //Bind the CUDA array to the texture.
  cudaBindTextureToArray(cudaTexture, cuArray);

  //Make a test array on the device and on the host to verify that the texture has been saved correctly.
  float* testArrayDevice;
  float* testArrayHost;

  //Allocate memory for the two test arrays.
  cudaMalloc((void **)&testArray, sizeof(float)*SIZE);
  testArrayHost = (float *)malloc(sizeof(float)*SIZE);

  //Read the texels of the texture to the test array in the device.
  readTexels(SIZE, testArrayDevice);

  //Copy the device test array to the host test array.
  cudaMemcpy(testArrayHost, testArrayDevice, sizeof(float)*SIZE, cudaMemcpyDeviceToHost);

  //Print contents of the array out.
  for (int i = 0; i < SIZE; i++)
  {
    printf("%f\n", testArrayHost[i]);
  }

  //Free the memory for the test arrays.
  free(testArrayHost);
  cudaFree(testArrayDevice);

  return cuArray; //TODO: UNBIND THE CUDA TEXTURE VIA cudaUnbindTexture(cudaTexture);
  //TODO: FREE THE CUDA ARRAY VIA cudaFree(cuArray);
}


int main(void)
{
  float* hostArray;

  hostArray = (float *)malloc(sizeof(float)*SIZE);

  for (int i = 0; i < SIZE; i++)
  {
    hostArray[i] = 10.f + i;
  }

  float* deviceAddy = copyToGPU(hostArray, SIZE);

  free(hostArray);

  return 0;
}
解决方案

Briefly:

------------- in your main.cu ---------------------------------------------------------------------------------------

-1. Define the texture as a globlal variable


       texture refTexture; // global variable !
       // meaning: address the texture with (x,y) (2D) and get an unsinged int

In the main function:

-2. Use arrays combined with texture

    cudaArray* myArray; // declar.
    // ask for memory
    cudaMallocArray (   &myArray,
&refTex.channelDesc, /* with this you don't need to fill a channel descriptor */ width,
height);

-3. copy data from CPU to GPU (to the array)

 cudaMemcpyToArray ( arrayCudaEntrada, // destination: the array
0, 0, // offsets sourceData, // pointer uint* width*height*sizeof(uint), // total amount of bytes to be copied cudaMemcpyHostToDevice);

-4. bind texture and array

    cudaBindTextureToArray( refTex,arrayCudaEntrada)

-5. change some parameters in the texture


refTextura_In.normalized = false; // don't automatically convert fetched data to [0,1[ refTextura_In.addressMode[0] = cudaAddressModeClamp; // if my indexing is out of bounds: automatically use a valid indexing (0 if negative index, last if too great index) refTextura_In.addressMode[1] = cudaAddressModeClamp;

---------- in the kernel --------------------------------------------------------

    // find out indexes (f,c) to process by this thread
     uint f = (blockIdx.x * blockDim.x) + threadIdx.x;
     uint c = (blockIdx.y * blockDim.y) + threadIdx.y;

  // this is curious and necessary: indexes for reading from a texture
  // are floats !. Even if you are certain to access (4,5) you have
  // match the "center" this is (4.5, 5.5)
  uint read = tex2D( refTex, c+0.5f, f+0.5f); // texRef is a global variable

Now You process read and write the results to other zone of the device globalmemory, not to the texture itself !

这篇关于如何读回CUDA纹理进行测试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 04:21
查看更多