c++ - 自定义glBlendFunc比本地慢很多

我正在尝试通过片段着色器创建自己的自定义glBlendFunc，但是，即使它们执行了精确的混合功能，我的解决方案也比本地glBlendFunc慢得多。

我想知道是否有人对如何以更有效的方式执行此操作有任何建议。

我的解决方案是这样的：

void draw(fbo fbos[2], render_item item)
{
   // fbos[0] is the render target
   // fbos[1] is the previous render target used to read "background" to blend against in shader
   // Both fbos have exactly the same content, however they need to be different since we can't both read and write to the same texture. The texture we render to needs to have the entire content since we might not draw geometry everywhere.

   fbos[0]->attach(); // Attach fbo
   fbos[1]->bind(1); // Bind as texture 1

   render(item);

   glCopyTexSubImage2D(...); // copy from fbos[0] to fbos[1], fbos[1] == fbos[0]
}

片段

vec4 blend_color(vec4 fore)
{
    vec4 back = texture2D(background, gl_TexCoord[1].st); // background is read from texture "1"
    return vec4(mix(back.rgb, fore.rgb, fore.a), back.a + fore.a);
}

最佳答案

改善基于FBO的混合性能的最佳选择是NV_texture_barrier。尽管名称如此，AMD也已经实现了它，因此，如果您坚持使用Radeon HD级卡，则应该可以使用它。

基本上，它使您无需进行诸如FBO绑定或纹理附着操作之类的重量级操作即可进行乒乓球。规范的底部有一部分显示了一般算法。

另一种选择是EXT_shader_image_load_store。这将需要DX11 / GL 4.x类硬件。 OpenGL 4.2最近通过ARB_shader_image_load_store将其提升为核心。

即使如此，正如Darcy所说，您也永远不会打败常规混合。它使用着色器无法访问的特殊硬件结构（因为它们是在着色器运行后发生的）。只有在某些效果绝对无法完成任何其他方式时，才应进行程序混合。