问题描述
我目前正在实施一个raytracer。由于光线跟踪是非常重要的计算,因为我将要研究CUDA编程反正,我想知道是否有任何人有任何经验,结合这两个。我不能真正告诉计算模型是否匹配,我想知道什么期望。
I'm currently implementing a raytracer. Since raytracing is extremely computation heavy and since I am going to be looking into CUDA programming anyway, I was wondering if anyone has any experience with combining the two. I can't really tell if the computational models match and I would like to know what to expect. I get the impression that it's not exactly a match made in heaven, but a decent speed increasy would be better than nothing.
推荐答案
这是一个很好的例子,在CUDA中要非常小心的一件事是,您的内核代码中的发散控制流程绝对KILLS性能,由于底层GPU硬件的结构。 GPU通常具有大量数据并行工作负载,具有高度一致的控制流(即,您有几百万像素,其中每一个(或至少大的一行)将由 exact 着色器程序,甚至在所有分支中采用相同的方向,这使得它们能够进行一些硬件优化,例如对于每组32个线程仅具有单个指令高速缓存,提取单元和解码逻辑,在理想情况下,通常在图形中,它们可以在同一周期中将相同的指令广播到所有32个执行单元(这被称为SIMD或单指令多数据),它们可以模拟 MIMD指令)和SPMD(单程序),但是当流多处理器(SM)中的线程发散(从分支中取不同的代码路径)时,发布逻辑实际上在逐个周期的基础上在每个代码路径之间切换你可以想象,在最糟糕的情况下,所有的线程都在不同的路径上,你的硬件利用率下降了32倍,有效地杀死了你在GPU上运行在CPU上的任何好处,特别是考虑到与将数据集从CPU,通过PCIe到GPU进行编组相关联的开销。
One thing to be very wary of in CUDA is that divergent control flow in your kernel code absolutely KILLS performance, due to the structure of the underlying GPU hardware. GPUs typically have massively data-parallel workloads with highly-coherent control flow (i.e. you have a couple million pixels, each of which (or at least large swaths of which) will be operated on by the exact same shader program, even taking the same direction through all the branches. This enables them to make some hardware optimizations, like only having a single instruction cache, fetch unit, and decode logic for each group of 32 threads. In the ideal case, which is common in graphics, they can broadcast the same instruction to all 32 sets of execution units in the same cycle (this is known as SIMD, or Single-Instruction Multiple-Data). They can emulate MIMD (Multiple-Instruction) and SPMD (Single-Program), but when threads within a Streaming Multiprocessor (SM) diverge (take different code paths out of a branch), the issue logic actually switches between each code path on a cycle-by-cycle basis. You can imagine that, in the worst case, where all threads are on separate paths, your hardware utilization just went down by a factor of 32, effectively killing any benefit you would've had by running on a GPU over a CPU, particularly considering the overhead associated with marshalling the dataset from the CPU, over PCIe, to the GPU.
也就是说,光线跟踪在某种意义上是数据并行广泛分散的控制流甚至适度复杂的场景。即使你设法将一堆紧密排列的光线映射到同一个SM上,你在初始反射中所拥有的数据和指令区域将不会保持很长时间。例如,想象所有32个高相干光线从球体弹出。在这种反弹之后,它们都将在相当不同的方向上行进,并且可能击中由不同材料制成的物体,具有不同的照明条件等。每个材料和一组照明,遮挡等条件具有与其相关联的其自己的指令流(以计算折射,反射,吸收等),因此,即使相当大的部分运行相同的指令流也变得相当困难的线程。这个问题,使用光线跟踪代码中的当前技术水平,将GPU利用率降低了16-32倍,这可能使得性能对于您的应用来说是不可接受的,尤其是如果它是实时的(例如游戏)。它仍然可能优于例如CPU的CPU。一个渲染农场。
That said, ray-tracing, while data-parallel in some sense, has widely-diverging control flow for even modestly-complex scenes. Even if you manage to map a bunch of tightly-spaced rays that you cast out right next to each other onto the same SM, the data and instruction locality you have for the initial bounce won't hold for very long. For instance, imagine all 32 highly-coherent rays bouncing off a sphere. They will all go in fairly different directions after this bounce, and will probably hit objects made out of different materials, with different lighting conditions, and so forth. Every material and set of lighting, occlusion, etc. conditions has its own instruction stream associated with it (to compute refraction, reflection, absorption, etc.), and so it becomes quite difficult to run the same instruction stream on even a significant fraction of the threads in an SM. This problem, with the current state of the art in ray-tracing code, reduces your GPU utilization by a factor of 16-32, which may make performance unacceptable for your application, especially if it's real-time (e.g. a game). It still might be superior to a CPU for e.g. a render farm.
现在有一个新兴的MIMD或SPMD加速器类在研究社区。我会将这些作为软件的实时光线跟踪的逻辑平台。
There is an emerging class of MIMD or SPMD accelerators being looked at now in the research community. I would look at these as logical platforms for software, real-time raytracing.
如果你对所涉及的算法感兴趣,并将它们映射到代码,请查看POVRay。还要研究光子映射,这是一个有趣的技术,甚至更接近代表物理现实比光线追踪一步。
If you're interested in the algorithms involved and mapping them to code, check out POVRay. Also look into photon mapping, it's an interesting technique that even goes one step closer to representing physical reality than raytracing.
这篇关于光线跟踪与CUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!