问题描述
我想学习如何从主机内存复制一个3维数组到设备内存中。可以说我有一个三维数组,它包含的数据。例如INT HOST_DATA [256] [256] [256];我要复制的数据到dev_data(一个元件阵列)以这样的方式,以便HOST_DATA [X] [Y] [Z] = dev_data [X] [Y] [Z]。我怎样才能做到这一点?而我怎么来访问设备的dev_data阵列?一个简单的例子是非常有益的。
I want to learn how can i copy a 3 dimensional array from host memory to device memory.Lets say i have a 3d array which contains data. For exampleint host_data[256][256][256];I want to copy that data to dev_data (a device array) in such a way sohost_data[x][y][z]=dev_data[x][y][z];How can i do it? and how am i supposed to access the dev_data array in the device?A simple example would be very helpfull.
推荐答案
的常用方法是变平的阵列(使其一维)。然后,你就必须做一些计算,从地图(X,Y,Z)
三到一个数字 - 在一个扁平的一维数组的位置。
The common way is to flatten an array (make it one-dimensional). Then you'll have to make some calculations to map from (x,y,z)
triple to one number - a position in a flattened one-dimensional array.
例2D:
int data[256][256];
int *flattened = data;
data[x][y] == fattened[x * 256 + y];
例如3D:
int data[256][256][256];
int *flattened = data;
data[x][y][z] == flattened[x * 256 * 256 + y * 256 + z];
或使用包装:
or use a wrapper:
__host__ __device___ inline int index(const int x, const int y, const int z) {
return x * 256 * 256 + y * 256 + z;
}
知道了,你可以分配一个与cudaMalloc线性阵列,像往常一样,然后用首页
函数访问设备code对应的元素。
Knowing that, you can allocate a linear array with cudaMalloc, as usual, then use an index
function to access corresponding element in device code.
更新:的作者this问题声称已经找到了更好的解决方案(至少在2D),你可能想看看。
Update:The author of this question claims to have found a better solution (at least for 2D), you might want to have a look.
这篇关于CUDA:如何从主机复制一个三维阵列设备?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!