问题描述
我想实现这个算法https://dournac.org/info/gpu_sum_reduction在 Vulkan 的计算着色器中.在 OpenCL 中,这很容易,因为我可以明确声明哪些缓冲区是 __local
,哪些是 __global
.不幸的是,我似乎无法找到Vulkan 中的任何此类机制.请有经验的人给我举个例子,如何让这些东西在 Vulkan 中工作?
I want to implement this algorithmhttps://dournac.org/info/gpu_sum_reductionin Vulkan's compute shader. In OpenCL it would be easy because I can explicitly declarewhat buffers are __local
and which are __global
. Unfortunately, I can't seem to findany such mechanisms in Vulkan. Could somebody more experienced, show me an example, how to get such things working in Vulkan, please?
推荐答案
Vulkan 中的子组功能似乎相同.这是一个着色器调用可以在子组中协作的功能.
Subgroups in Vulkan seem equivalent functionality. It is a functionality where the shader invocations can cooperate in the subgroup.
也许是这样的:
void main(){
int partial_sum = subgroupAdd(arr[gl_GlobalInvocationID.x]);
if (subgroupElect()) {
atomicAdd(mem, partial_sum);
}
}
您可以学习子组教程.
再一次,您可以尝试以正常"方式进行操作.简单的方式:
Then again you could just try going about it the "normal" way by simply:
void main(){
int partial_sum = 0;
for( int i = 0; i < LOCAL_SIZE; ++i ){
partial_sum += arr[gl_WorkGroupID.x * gl_WorkGroupSize.x + i]
}
atomicAdd(mem, partial_sum); // or perhaps without atomics by recursively reducing
}
它不是那么平行",但又不需要障碍.这只是衡量性能以找出最有效的方法的问题,也可能取决于您假设输入数组有多大.
It is not so much "parallel", but then again needs no barriers. It is just a matter of measuring performance to find what works best, and also might depend how large do you assume the input arrays are.
免责声明:我没有尝试过着色器,所以假设它们是一种伪代码并且可能有错误.
Disclaimer: I have not tried the shaders, so assume they are kind of a pseudocode and can have bugs.
也应该可以几乎逐字地实现您的链接算法.与计算 GLSL 中的 __local
等效的是 shared
.GLSL 中的工作组屏障是 memoryBarrierShared()
.
It should also be possible to implement your linked algorithm nearly verbatim. The equivalent to __local
in compute GLSL is shared
. The workgroup barrier in GLSL is memoryBarrierShared()
.
这篇关于用于并行求和的 Vulkan 计算着色器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!