本文介绍了在OpenGL程序中计时?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经学到了足够的OpenGL/GLUT(使用PyOpenGL)来提出一个简单的程序,该程序可以设置片段着色器,绘制全屏四边形并生成与显示同步的帧(shadertoy样式).我也在某种程度上了解了图形管道.

I have learned enough OpenGL/GLUT (using PyOpenGL) to come up with a simple program that sets up a fragment shader, draws a full screen quad, and produces frames in sync with the display (shadertoy-style). I also to some degree understand the graphics pipeline.

我不了解的是OpenGL程序和图形管道如何配合在一起.特别是在我的GLUT显示回调中,

What I don't understand is how the OpenGL program and the graphics pipeline fit together. In particular, in my GLUT display callback,

# set uniforms
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4)  # draw quad
glutSwapBuffers()

我想我通过通过glDrawArrays给顶点着色器来激活顶点着色器,这会产生碎片(像素).但是,片段着色器是否在glDrawArrays之后立即启动?有碎片,所以它可以做一些事情.另一方面,仍然可能存在其他绘制命令来创建更多顶点,这可能会a)产生新的片段,b)覆盖现有片段.

I suppose I activate the vertex shader by giving it vertices through glDrawArrays, which produces fragments (pixels). But then, does the fragment shader kick in immediately after glDrawArrays? There are fragments, so it can do something. On the other hand, it is still possible that there are further draw commands creating further vertices, which can a) produce new fragments and b) overwrite existing fragments.

我分析了该程序,发现99%的时间都花在了glutSwapBuffers中.这当然部分是由于等待垂直同步,但是当我使用非常苛刻的片段着色器时会保持这种状态,这会大大降低帧速率.这表明片段着色器仅在glutSwapBuffers中的某个位置被激活.正确吗?

I profiled the program and found that 99% of the time is spent in glutSwapBuffers. That is of course partially due to waiting for the vertical sync, but it stays that way when I use a very demanding fragment shader which significantly reduces the frame rate. That suggests that the fragment shader is only activated somewhere in glutSwapBuffers. Is that correct?

我了解片段着色器是在GPU而不是CPU上执行的,但是在glutSwapBuffers ...

I understand that the fragment shader is executed on the GPU, not the CPU, but it still appears that the CPU (program) waits until the GPU (shader) is finished, within glutSwapBuffers...

推荐答案

不.这种逻辑是完全有缺陷的.这里的重点是片段着色器运行在GPU 上,它与CPU完全不同步.您不是在测量片段着色器,而是在测量一些隐式的CPU-GPU同步-看起来您的实现在缓冲区交换上是同步的(如果排队的帧过多),那么您所测量的只是CPU的时间必须等待GPU.而且,如果您在不显着增加CPU工作负载的情况下增加了GPU工作负载,则CPU只会花费更多时间等待.

No. That logic is completely flawed. The main point here is that the fragment shader runs on the GPU, which works totally asynchronous to the CPU. You are not measuring the fragment shader, you are measuring some implicit CPU-GPU-synchronization - it looks like your implementation syncs on the buffer swap (if too many frames are queued up, probably), so all you measure is the time the CPU has to wait for the GPU. And if you increase the GPU workload without significantly increasing the CPU workload, your CPU will just spend more time waiting.

OpenGL本身未定义任何此类内容,因此所有详细信息最终都完全是特定于实现的.规范仅保证该实现的行为就像片段是按绘制基元的顺序生成的(例如,启用混合功能后,实际顺序将成为相关的场景覆盖场景).但是什么时候会生成片段,以及在顶点处理和片段着色器调用之间可能发生哪些优化,完全超出了您的控制范围. GPU可能会采用基于图块的栅格化方案,其中实际的片段着色会(如果可能)延迟一点(以可能),以提高效率并避免过度着色.

OpenGL itself does not define any of this, so all the details are ultimately completely implementation-specific. It is just guaranteed by the spec that the implementation will behave as if the fragments were generated in the order in which you draw the primitives (e.g. with blending enabled, the actual order becomes relevant evan ion overdraw scenarios). But at what point the fragments will be generated, and which optimizations might happen in-between vertex processing and invocation of your fragment shader, is totally out of your control. GPUs might employ tile-based rasterization schemes, where the actual fragment shading is delayed a bit (if possible) to improve efficiency and avoid overshading.

请注意,大多数GPU驱动程序完全异步工作.当您调用gl*()命令时,它会在处理之前返回.它可能只排队等待后续处理(例如在另一个驱动程序线程中),并且最终将在某些GPU特定的命令缓冲区中进行转换,然后将这些缓冲区传输到GPU.您可能最终会隐式地进行CPU-GPU同步(或带有驱动程序线程的CPU-CPU),例如,当在一次draw调用后读回帧缓冲区数据时,这意味着所有以前的GL命令将被刷新以进行处理,并且CPU将在检索图像数据之前等待处理完成-这也是使此类回读如此缓慢的原因.

Note that most GPU drivers work completely asynchronously. When you call a gl*() command it returns before it has been processed. It might only be queued up for later processing (e.g. in another driver thread), and will ultimately be transformed in some GPU-specific command buffers which are transferred to the GPU. You might end up with implicit CPU-GPU synchronization (or CPU-CPU with a driver thread), for example, when you read back framebuffer data after a draw call, this will imply that all previous GL commands will be flushed for processing, and the CPU will wait for the processing to be done before retrieving the image data - and that is also what makes such readbacks so slow.

因此,任何CPU方面的OpenGL代码测量都是完全没有意义的.您需要在GPU上测量时间 ,这就是计时器查询用于.

As a consequence, any CPU-side measures of OpenGL code are completely meaningless. You need to measure the timing on the GPU, and that's what Timer Queries are for.

这篇关于在OpenGL程序中计时?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 15:02