



当我从内核中挤压出最后一点性能时,我通常会发现用逻辑运算符&& 按位运算符(& 和<$ c $)键入 || c> | )使内核更快一点。这是通过查看CUDA Visual Profiler中的内核时间摘要来观察的。

When I am down to squeezing the last bit of performance out of a kernel, I usually find that replacing the logical operators (&& and ||) with bitwise operators (& and |) makes the kernel a little bit faster. This was observed by looking at the kernel time summary in CUDA Visual Profiler.


So, why are bitwise operators faster than logical operators in CUDA? I must admit that they are not always faster, but a lot of times they are. I wonder what magic can give this speedup.

免责声明:我知道逻辑运算符短路和按位运算符不。我很清楚这些操作符如何被滥用,导致错误的代码。只有当结果逻辑保持不变,有一个加速,因此获得的加速对我有用时,我才使用这个替换: - )

Disclaimer: I am aware that logical operators short-circuit and bitwise operators do not. I am well aware of how these operators can be misused resulting in wrong code. I use this replacement with care only when the resulting logic remains the same, there is a speedup and the speedup thus obtained matters to me :-)



Logical operators will often result in branches, particularly when the rules of short circuit evaluation need to be observed. For normal CPUs this can mean branch misprediction and for CUDA it can mean warp divergence. Bitwise operations do not require short circuit evaluation so the code flow is linear (i.e. branchless).


