本文介绍了推动用户编写的内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Thrust的新手。我看到所有的Thrust演示和示例只显示主机代码。

I am a newbie to Thrust. I see that all Thrust presentations and examples only show host code.

我想知道我是否可以传递一个device_vector到我自己的内核?怎么样?
如果是,在内核/设备代码中允许的操作是什么?

I would like to know if I can pass a device_vector to my own kernel? How?If yes, what are the operations permitted on it inside kernel/device code?

推荐答案

Thrust是一个纯粹的主机侧抽象。它不能在内核中使用。您可以将封装在thrust :: device_vector内的设备内存传递到您自己的内核,如下所示:

Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this:

thrust::device_vector< Foo > fooVector;
// Do something thrust-y with fooVector

Foo* fooArray = thrust::raw_pointer_cast( &fooVector[0] );

// Pass raw array and its size to kernel
someKernelCall<<< x, y >>>( fooArray, fooVector.size() );

您还可以使用推力算法中未通过推力分配的设备内存,方法是实例化一个thrust :: device_ptr

and you can also use device memory not allocated by thrust within thrust algorithms by instantiating a thrust::device_ptr with the bare cuda device memory pointer.

在四年半后编辑以添加根据@ JackOLantern的回答,推力1.8添加了一个顺序执行策略,这意味着您可以在设备上运行单线程版本的推力算法。注意,仍然不可能直接将推力装置向量传递到内核,并且不能在装置代码中直接使用装置向量。还要注意,不支持当前的动态并行性,所以你不能有一个内核作为子网格启动平行推力执行(虽然这将是一个潜在的非常有趣的功能)。

Edited four and half years later to add that as per @JackOLantern's answer, thrust 1.8 adds a sequential execution policy which means you can run single threaded versions of thrust's alogrithms on the device. Note that it still isn't possible to directly pass a thrust device vector to a kernel and device vectors can't be directly used in device code. Note also that currently dynamic parallelism is not supported, so you cannot have parallel thrust execution from launched by a kernel as a child grid (although that would be a potentially very interesting feature).

这篇关于推动用户编写的内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-15 23:04