为什么CUDA中存在扭曲级同步原语?

本文介绍了为什么CUDA中存在扭曲级同步原语?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！关于CUDA中的 __ syncwarp()，我有两个问题:I have two questions regarding __syncwarp() in CUDA:如果我理解正确，那么将在SIMD功能中执行CUDA中的扭曲.难道不意味着warp中的所有线程都始终同步吗?如果是这样， __ syncwarp()到底是做什么的，为什么有必要?假设我们启动了一个内核，该内核的块大小为1024，其中一个块中的线程分为每组32个线程的组.每个线程都通过共享内存与该组中的其他线程通信，但不与该组外的任何线程通信.在这样的内核中，我可以看到比 __ syncthreads()更细粒度的同步可能有用，但是由于将块分成的扭曲可能与组不匹配，因此如何保证正确性?使用 __ syncwarp()?If I understand correctly, a warp in CUDA is executed in an SIMD fasion. Does that not imply that all threads in a warp are always synchronized? If so, what exactly does __syncwarp() do, and why is it necessary?Say we have a kernel launched with a block size of 1024, where the threads within a block are divided into groups of 32 threads each. Each thread communicates with other threads in it's group via shared memory, but does not communicate with any thread outside it's group. In such a kernel, I can see how a more granular synchronization than __syncthreads() may be useful, but since the warps the block is split into may not match with the groups, how would one guarantee correctness when using __syncwarp()?推荐答案不.可能存在warp级执行差异(通常是分支，但也可能是warp shuffle，表决和谓词执行等其他事物)，由指令重播或执行屏蔽处理.请注意，在现代" CUDA中，隐式扭曲同步编程是不再安全，因此翘曲级别同步不仅是理想的，而且是强制性的.No. There can be warp level execution divergence (usually branching, but can be other things like warp shuffles, voting, and predicated execution), handled by instruction replay or execution masking. Note that in "modern" CUDA, implicit warp synchronous programming is no longer safe, thus warp level synchronization is not just desirable, it is mandatory.因为可能会出现翘曲级别执行差异，所以这是在差异翘曲中实现同步的方式.Because there can be warp level execution divergence, and this is how synchronization within a divergent warp is achieved.通过确保始终使用计算的经纱边界(或适当的线程掩码)显式执行拆分.By ensuring that the split is always performed explicitly using calculated warp boundaries (or a suitable thread mask). 这篇关于为什么CUDA中存在扭曲级同步原语?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！