问题描述
OpenGL红皮书版本9(OpenGL 4.5)示例11.13是简单的每像素互斥体.它在do {} while()
循环中使用imageAtomicCompSwap
进行每像素锁定,以防止在同一像素坐标对应的像素着色器调用之间同时访问共享资源.
layout (binding = 0, r32ui) uniform volatile coherent uimage2D lock_image;
void main(void)
{
ivec2 pos = ivec2(gl_FragCoord.xy);
// spinlock - acquire
uint lock_available;
do {
lock_available = imageAtomicCompSwap(lock_image, pos, 0, 1);
} while (lock_available != 0);
// do some operations protected by the lock
do_something();
// spinlock - release
imageStore(lock_image, pos, uvec4(0));
}
此示例在Nvidia和AMD GPU上均产生APPCRASH.我知道在这两个平台上PS的职业无法彼此独立地进行-线程的子组以锁步的方式执行,共享控制流(Nvidia术语中的32个线程的扭曲").因此可能会导致死锁.
但是,OpenGL规范在任何地方都没有提到锁步执行的线程" .它仅提及未定义相同着色器类型的调用的相对顺序." .如本例所示,为什么我们不能使用原子操作imageAtomicCompSwap
来确保不同PS调用之间的互斥访问?这是否意味着Nvidia和AMD GPU不符合OpenGL规范?
如果您使用原子操作来锁定对像素的访问,那么您将依赖于相对顺序的一个方面:所有线程最终都会前进.也就是说,您假定在锁上旋转的任何线程都不会使拥有其执行资源锁的线程饿死.持有该锁的线程最终将前进并释放它.
但是,由于执行的相对顺序是 undefined ,因此无法保证其中的任何一个.因此,您的代码无法正常工作.任何依赖单个着色器阶段调用之间顺序的任何方面的代码都无法工作(除非有特定的保证).
这正是 ARB_fragment_shader_interlock 存在的原因.
话虽如此,即使可以保证前进的进度,您的代码仍然会被破坏.
您使用非原子操作来释放锁.您应该使用原子设置操作.
另外,正如其他人指出的那样,如果原子比较/交换的返回值为 not 零,则需要继续旋转.请记住: all 原子函数从图像返回原始值.因此,如果原子读取的原始值不为0,则比较为false,则您没有锁.
现在,按照规范,您的代码仍将是UB.但这更有可能起作用.
OpenGL red book version 9 (OpenGL 4.5) example 11.13 is Simple Per-Pixel Mutex. It uses imageAtomicCompSwap
in a do {} while()
loop to take a per-pixel lock to prevent simultaneous access to a shared resouce between pixel shader invocations corresponding to the same pixel coordinate.
layout (binding = 0, r32ui) uniform volatile coherent uimage2D lock_image;
void main(void)
{
ivec2 pos = ivec2(gl_FragCoord.xy);
// spinlock - acquire
uint lock_available;
do {
lock_available = imageAtomicCompSwap(lock_image, pos, 0, 1);
} while (lock_available != 0);
// do some operations protected by the lock
do_something();
// spinlock - release
imageStore(lock_image, pos, uvec4(0));
}
This example results in APPCRASH on both Nvidia and AMD GPUs. I know on these two platforms PS vocations are unable to progress indepenently of each other - a sub-group of threads is executed in lockstep, sharing the control flow (a "warp" of 32 threads in Nvidia's terminology). So it may result in deadlock.
However, there is nowhere that OpenGL spec mentioned "threads executed in lockstep". It only mentioned "The relative order of invocations of the same shader type are undefined.". As in this example, why can we not use atomic operation imageAtomicCompSwap
to ensure exclusive access between different PS invocations? Does this mean Nvidia and AMD GPU not conform with OpenGL spec?
If you are using atomic operations to lock access to a pixel, you are relying on one aspect of relative order: that all threads will eventually make forward progress. That is, you assume that any thread spinning on a lock will not starve the thread that has the lock of its execution resources. That threads holding the lock will eventually make forward progress and release it.
But since the relative order of execution is undefined, there is no guarantee of any of that. And therefore, your code cannot work. Any code which relies on any aspect of ordering between the invocations of a single shader stage cannot work (unless there are specific guarantees in place).
This is precisely why ARB_fragment_shader_interlock exists.
That being said, even if there were guarantees of forward progress, your code would still be broken.
You use a non-atomic operation to release the lock. You should be using an atomic set operation.
Plus, as others have pointed out, you need to continue to spin if the return value from the atomic compare/swap is not zero. Remember: all atomic functions return the original value from the image. So if the original value it atomically read is not 0, then it compared false and you don't have the lock.
Now, your code will still be UB by the spec. But it's more likely to work.
这篇关于使用imageAtomicCompSwap的GLSL每像素自旋锁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!