问题描述
我想使用Thrust库来计算CUDA中设备阵列的前缀总和。
我的数组分配了 cudaMalloc()
。我的要求如下:
I want to use Thrust library to calculate prefix sum of device array in CUDA.My array is allocated with cudaMalloc()
. My requirement is as follows:
main()
{
Launch kernel 1 on data allocated through cudaMalloc()
// This kernel will poplulate some data d.
Use thrust to calculate prefix sum of d.
Launch kernel 2 on prefix sum.
}
我想在内核之间的某个地方使用Thrust,所以我需要方法来转换指针
I want to use Thrust somewhere between my kernels so I need method to convert pointers to device iterators and back.What is wrong in following code?
int main()
{
int *a;
cudaMalloc((void**)&a,N*sizeof(int));
thrust::device_ptr<int> d=thrust::device_pointer_cast(a);
thrust::device_vector<int> v(N);
thrust::exclusive_scan(a,a+N,v);
return 0;
}
推荐答案
来自您的完整示例最新修改如下:
A complete working example from your latest edit would look like this:
#include <thrust/device_ptr.h>
#include <thrust/device_vector.h>
#include <thrust/scan.h>
#include <thrust/fill.h>
#include <thrust/copy.h>
#include <cstdio>
int main()
{
const int N = 16;
int * a;
cudaMalloc((void**)&a, N*sizeof(int));
thrust::device_ptr<int> d = thrust::device_pointer_cast(a);
thrust::fill(d, d+N, 2);
thrust::device_vector<int> v(N);
thrust::exclusive_scan(d, d+N, v.begin());
int v_[N];
thrust::copy(v.begin(), v.end(), v_);
for(int i=0; i<N; i++)
printf("%d %d\n", i, v_[i]);
return 0;
}
您错了:
-
N
在任何地方都没有定义 - 传递原始设备指针
a
而不是device_ptr
d
作为 exclusive_scan - 通过
device_vector
v
到exclusive_scan
,而不是适当的迭代器v.begin()
N
not defined anywhere- passing the raw device pointer
a
rather than thedevice_ptr
d
as the input iterator toexclusive_scan
- passing the
device_vector
v
toexclusive_scan
rather than the appropriate iteratorv.begin()
对细节的关注是使这项工作缺乏的全部。工作确实可以做到:
Attention to detail was all that is lacking to make this work. And work it does:
$ nvcc -arch=sm_12 -o thrust_kivekset thrust_kivekset.cu
$ ./thrust_kivekset
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
10 20
11 22
12 24
13 26
14 28
15 30
编辑:
thrust :: device_vector.data()
将返回 thrust :: device_ptr
指向向量的第一个元素。 thrust :: device_ptr.get()
将返回原始设备指针。因此,
thrust::device_vector.data()
will return a thrust::device_ptr
which points to the first element of the vector. thrust::device_ptr.get()
will return a raw device pointer. Therefore
cudaMemcpy(v_, v.data().get(), N*sizeof(int), cudaMemcpyDeviceToHost);
和
thrust::copy(v, v+N, v_);
在此示例中在功能上等效。
are functionally equivalent in this example.
这篇关于在原始指针与推力::迭代器之间进行转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!