Cuda从设备内存创建3d纹理和cudaArray

Cuda从设备内存创建3d纹理和cudaArray

本文介绍了Cuda从设备内存创建3d纹理和cudaArray(3d)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

im试图从设备阵列的一部分创建纹理3d.

im trying to create a texture 3d from a part of a device array.

为此,这些是我的步骤:

To do this, these are my steps:

  1. malloc设备阵列
  2. 写入设备阵列
  3. 创建CudaArray(3D)
  4. 将纹理绑定到CudaArray

即时通讯的方式不会产生编译器错误,但是当我运行cuda-memcheck时,即时通讯试图从Texture中获取数据时会失败.

The way im doing it it creates no compiler errors, but when i run cuda-memcheck it's failing when im trying to fetch data from the Texture.

这就是为什么我猜我声明纹理数组错误.这是我访问纹理的方式:

Thats why i'm guessing i declared the texture Array wrong.here is how i access the texture:

我执行上述步骤的方式:

The way im doing the steps mentioned above:

1.Malloc设备阵列

1.Malloc Device Array

cudaMalloc((void **)&d_Noise, sqrSizeNoise*nNoise*sizeof(float));

2.写入设备阵列

curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
curandSetPseudoRandomGeneratorSeed(gen,Seed);
curandGenerateUniform(gen, d_Noise, sqrSizeNoise*nNoise);
curandDestroyGenerator(gen);

3 + 4.创建Cuda数组并将其绑定到纹理(我猜这里是错误的地方)

3+4.Creating the Cuda Array and binding it to the texture (Im guessing the mistake is here)

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();//cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray *d_cuArr;
cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoise,SizeNoise,SizeNoise), 0);
cudaMemcpy3DParms copyParams = {0};

//Loop for every separated Noise Texture (nNoise = 4)
for(int i = 0; i < nNoise; i++){

    //initialize the textures
    NoiseTextures[i] = texture<float, 3, cudaReadModeElementType>(1,cudaFilterModeLinear,cudaAddressModeWrap,channelDesc);

    //Array creation
    //+(sqrSizeNoise*i) is to separate the created Noise Array into smaller parts with the size of SizeNoise^3
    copyParams.srcPtr   = make_cudaPitchedPtr(d_Noise+(sqrSizeNoise*i), SizeNoise*sizeof(float), SizeNoise, SizeNoise);
    copyParams.dstArray = d_cuArr;
    copyParams.extent   = make_cudaExtent(SizeNoise,SizeNoise,SizeNoise);
    copyParams.kind     = cudaMemcpyDeviceToDevice;
    checkCudaErrors(cudaMemcpy3D(&copyParams));
    //Array creation End

    //new Bind
    // set texture parameters
    NoiseTextures[i].normalized = true;                      // access with normalized texture coordinates
    NoiseTextures[i].filterMode = cudaFilterModeLinear;      // linear interpolation
    NoiseTextures[i].addressMode[0] = cudaAddressModeWrap;   // wrap texture coordinates
    NoiseTextures[i].addressMode[1] = cudaAddressModeWrap;
    NoiseTextures[i].addressMode[2] = cudaAddressModeWrap;

    // bind array to 3D texture
    checkCudaErrors(cudaBindTextureToArray(NoiseTextures[i], d_cuArr, channelDesc));
    //end Bind
}
cudaFreeArray(d_cuArr);

我已将此代码段粘贴到Pastebin,以便更容易查看颜色等. http://pastebin.com/SM3dYd38

I've Pasted this code snippet to Pastebin so its easier to look at with colors etc.http://pastebin.com/SM3dYd38

我希望我清楚地描述了我的问题.如果没有,请发表评论!

I hope I clearly described my problem. If not pls comment!

您能帮我吗?感谢您的阅读,

Can you help me with this?Thanks for reading,

很好

这是完整的代码,因此您可以在自己的计算机上尝试使用它:

Here is a complete code so you can try it on your own machine:

#include <helper_cuda.h>
#include <helper_functions.h>
#include <helper_cuda_gl.h>
#include <texture_types.h>
#include <cuda_runtime.h>
#include <curand.h>

static texture<float, 3, cudaReadModeElementType> NoiseTextures[4];//texture Array
float *d_NoiseTest;//Device Array with random floats
int SizeNoiseTest = 32;
int sqrSizeNoiseTest = 32768;

void CreateTexture();

__global__ void AccesTexture(texture<float, 3, cudaReadModeElementType>* NoiseTextures)
{
        int test = tex3D(NoiseTextures[0],threadIdx.x,threadIdx.y,threadIdx.z);//by using this the error occurs
}

int
main(int argc, char **argv)
{
        CreateTexture();
}
void CreateTexture()
{
        //curand Random Generator (needs compiler link -lcurand)
        curandGenerator_t gen;
        cudaMalloc((void **)&d_NoiseTest, sqrSizeNoiseTest*4*sizeof(float));//Allocation of device Array
        curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
        curandSetPseudoRandomGeneratorSeed(gen,1234ULL);
        curandGenerateUniform(gen, d_NoiseTest, sqrSizeNoiseTest*4);//writing data to d_NoiseTest
        curandDestroyGenerator(gen);

        //cudaArray Descriptor
        cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
        //cuda Array
        cudaArray *d_cuArr;
        cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0);
        cudaMemcpy3DParms copyParams = {0};

        //Loop for every separated Noise Texture (4 = 4)
        for(int i = 0; i < 4; i++){

                //initialize the textures
                NoiseTextures[i] = texture<float, 3, cudaReadModeElementType>(1,cudaFilterModeLinear,cudaAddressModeWrap,channelDesc);

                //Array creation
                //+(sqrSizeNoise*i) is to separate the created Noise Array into smaller parts with the size of SizeNoise^3
                copyParams.srcPtr   = make_cudaPitchedPtr(d_NoiseTest+(sqrSizeNoiseTest*i), SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
                copyParams.dstArray = d_cuArr;
                copyParams.extent   = make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest);
                copyParams.kind     = cudaMemcpyDeviceToDevice;
                checkCudaErrors(cudaMemcpy3D(&copyParams));
                //Array creation End

                //new Bind
                // set texture parameters
                NoiseTextures[i].normalized = true;                      // access with normalized texture coordinates
                NoiseTextures[i].filterMode = cudaFilterModeLinear;      // linear interpolation
                NoiseTextures[i].addressMode[0] = cudaAddressModeWrap;   // wrap texture coordinates
                NoiseTextures[i].addressMode[1] = cudaAddressModeWrap;
                NoiseTextures[i].addressMode[2] = cudaAddressModeWrap;

                // bind array to 3D texture
                checkCudaErrors(cudaBindTextureToArray(NoiseTextures[i], d_cuArr, channelDesc));
                //end Bind
        }
        cudaFreeArray(d_cuArr);

        AccesTexture<<<1,dim3(4,4,4)>>>(NoiseTextures);
}

不过,您需要链接-lcurand.并包含CUDA-6.0/samples/common/inc

You need to link -lcurand though. And include CUDA-6.0/samples/common/inc

我现在在此代码中遇到另一个错误

Im now getting a different error in this code

即使代码和我的原始代码完全一样. -我开始变得完全困惑.谢谢您的帮助

Even though it's the exact same code then my original. - Im starting to get completely confused.Thank you for your help

推荐答案

以下是一个有效的示例,显示了创建纹理对象数组的过程,大致遵循您提供的代码的路径.通过与我在此处放置的纹理参考代码进行比较,您可以看到第一组纹理是从第一个纹理对象(即第一个内核调用)与从纹理参考示例读取的一组数值相同(您可能需要调整两个示例代码的网格大小以匹配).

Here's a worked example showing the creation of an array of texture objects, roughly following the path of the code you provided. You can see, by comparing to the texture reference code I placed here, that the first set of texture reads from the first texture object (i.e. the first kernel call) are the same numerical values as the set of reads from the texture reference example (you may need to adjust the grid size of the two example codes to match).

使用纹理对象需要具有3.0或更高版本的计算能力.

Texture object usage requires compute capability 3.0 or higher.

示例:

$ cat t507.cu
#include <helper_cuda.h>
#include <curand.h>
#define NUM_TEX 4

const int SizeNoiseTest = 32;
const int cubeSizeNoiseTest = SizeNoiseTest*SizeNoiseTest*SizeNoiseTest;
static cudaTextureObject_t texNoise[NUM_TEX];

__global__ void AccesTexture(cudaTextureObject_t my_tex)
{
        float test = tex3D<float>(my_tex,(float)threadIdx.x,(float)threadIdx.y,(float)threadIdx.z);//by using this the error occurs
        printf("thread: %d,%d,%d, value: %f\n", threadIdx.x, threadIdx.y, threadIdx.z, test);
}

void CreateTexture()
{

    float *d_NoiseTest;//Device Array with random floats
    cudaMalloc((void **)&d_NoiseTest, cubeSizeNoiseTest*sizeof(float));//Allocation of device Array
    for (int i = 0; i < NUM_TEX; i++){
        //curand Random Generator (needs compiler link -lcurand)
        curandGenerator_t gen;
        curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
        curandSetPseudoRandomGeneratorSeed(gen,1235ULL+i);
        curandGenerateUniform(gen, d_NoiseTest, cubeSizeNoiseTest);//writing data to d_NoiseTest
        curandDestroyGenerator(gen);

        //cudaArray Descriptor
        cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
        //cuda Array
        cudaArray *d_cuArr;
        checkCudaErrors(cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0));
        cudaMemcpy3DParms copyParams = {0};


        //Array creation
        copyParams.srcPtr   = make_cudaPitchedPtr(d_NoiseTest, SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
        copyParams.dstArray = d_cuArr;
        copyParams.extent   = make_cudaExtent(SizeNoiseTest,SizeNoiseTest,SizeNoiseTest);
        copyParams.kind     = cudaMemcpyDeviceToDevice;
        checkCudaErrors(cudaMemcpy3D(&copyParams));
        //Array creation End

        cudaResourceDesc    texRes;
        memset(&texRes, 0, sizeof(cudaResourceDesc));
        texRes.resType = cudaResourceTypeArray;
        texRes.res.array.array  = d_cuArr;
        cudaTextureDesc     texDescr;
        memset(&texDescr, 0, sizeof(cudaTextureDesc));
        texDescr.normalizedCoords = false;
        texDescr.filterMode = cudaFilterModeLinear;
        texDescr.addressMode[0] = cudaAddressModeClamp;   // clamp
        texDescr.addressMode[1] = cudaAddressModeClamp;
        texDescr.addressMode[2] = cudaAddressModeClamp;
        texDescr.readMode = cudaReadModeElementType;
        checkCudaErrors(cudaCreateTextureObject(&texNoise[i], &texRes, &texDescr, NULL));}
}

int main(int argc, char **argv)
{
        CreateTexture();
        AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[0]);
        AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[1]);
        AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[2]);
        checkCudaErrors(cudaPeekAtLastError());
        checkCudaErrors(cudaDeviceSynchronize());
        return 0;
}

编译为:

$ nvcc -arch=sm_30 -I/shared/apps/cuda/CUDA-v6.0.37/samples/common/inc -lcurand -o t507 t507.cu

输出:

$ cuda-memcheck ./t507
========= CUDA-MEMCHECK
thread: 0,0,0, value: 0.310691
thread: 1,0,0, value: 0.627906
thread: 0,1,0, value: 0.638900
thread: 1,1,0, value: 0.665186
thread: 0,0,1, value: 0.167465
thread: 1,0,1, value: 0.565227
thread: 0,1,1, value: 0.397606
thread: 1,1,1, value: 0.503013
thread: 0,0,0, value: 0.809163
thread: 1,0,0, value: 0.795669
thread: 0,1,0, value: 0.808565
thread: 1,1,0, value: 0.847564
thread: 0,0,1, value: 0.853998
thread: 1,0,1, value: 0.688446
thread: 0,1,1, value: 0.733255
thread: 1,1,1, value: 0.649379
thread: 0,0,0, value: 0.040824
thread: 1,0,0, value: 0.087417
thread: 0,1,0, value: 0.301392
thread: 1,1,0, value: 0.298669
thread: 0,0,1, value: 0.161962
thread: 1,0,1, value: 0.316443
thread: 0,1,1, value: 0.452077
thread: 1,1,1, value: 0.477722
========= ERROR SUMMARY: 0 errors

在这种情况下,我使用同一内核(多次调用)从单个纹理对象读取.应该可以将多个对象传递给同一个内核,但是,如果可以避免在代码中使用单个 warp ,则不建议从多个纹理中读取单个 warp .实际问题位于四级,我不希望涉及.最好安排代码,以便在任何给定的周期内从相同的纹理对象读取扭曲.

In this case I'm using the same kernel, called multiple times, to read from the individual texture objects. It should be possible to pass multiple objects to the same kernel, however it is not advisable to have a single warp read from multiple textures, if that can be avoided in your code. The actual issue resides at the quad level, which I'd prefer not to get into. It's best if you can arrange your code so that a warp is reading from the same texture object, on any given cycle.

请注意,为简化表示,此CreateTexture()函数在循环处理期间会覆盖先前分配的设备指针,例如d_cuArr.这不是非法的或功能性的问题,但它增加了内存泄漏的可能性.

Note that for simplicity of presentation, this CreateTexture() function overwrites previously allocated device pointers such as d_cuArr, during the processing of the loop. This isn't illegal or a functional issue, but it raises the possibility of memory leaks.

如果您担心的话,我假设您可以修改代码来处理这些代码的解除分配.这段代码的目的是演示使事情正常进行的方法.

I assume you can modify the code to handle deallocation of those if this is a concern. The purpose of this code is to demonstrate the method to get things working.

这篇关于Cuda从设备内存创建3d纹理和cudaArray(3d)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 04:21