问题描述
我有一个顶点缓冲区,它存储在设备内存和缓冲区中,并且主机可见且主机一致.
I have a vertex buffer that is stored in a device memory and a buffer and is host visible and host coherent.
为了写入主机端的顶点缓冲区,我将其映射、memcpy 到它并取消映射设备内存.
To write to the vertex buffer on the host side I map it, memcpy to it and unmap the device memory.
为了读取它,我在记录渲染过程期间将顶点缓冲区绑定到命令缓冲区中.这些命令缓冲区在获取、提交和呈现的循环中提交,以绘制每一帧.
To read from it I bind the vertex buffer in a command buffer during recording a render pass. These command buffers are submitted in a loop that acquires, submits and presents, to draw each frame.
目前我在程序启动时向顶点缓冲区写入一次.
Currently I write once to the vertex buffer at program start up.
顶点缓冲区在循环期间保持不变.
The vertex buffer then remains the same during the loop.
我想从主机端修改每个帧之间的顶点缓冲区.
I'd like to modify the vertex buffer between each frame from the host side.
我不清楚的是将这些主机端写入与设备端读取同步的最佳/正确方法.目前我有一个栅栏和一对信号量,允许在飞行中模拟.
What I'm not clear on is the best/right way to synchronize these host-side writes with the device-side reads. Currently I have a fence and pair of semaphores for each frame allowed simulatenously in flight.
对于每一帧:
我在栅栏上等.
I wait on the fence.
我重置了围栏.
获取信号信号量 #1.
The acquire signals semaphore #1.
队列提交等待信号量 #1 和信号量 #2 并发出信号.
The queue submit waits on semaphore #1 and signals semaphore #2 and signals the fence.
当前等待信号量 #2
此处放置主机端映射/memcpy/unmap 的正确位置在哪里,我应该如何将其与设备读取正确同步?
Where is the right place in this to put the host-side map/memcpy/unmap and how should I synchronize it properly with the device reads?
推荐答案
如果您想利用异步 GPU 执行,您希望 CPU 避免因 GPU 操作而停顿.因此,永远不要等待刚刚发布的批次.内存也是一样:您永远不应该希望写入正在被您刚刚提交的 GPU 操作读取的内存.
If you want to take advantage of asynchronous GPU execution, you want the CPU to avoid having to stall for GPU operations. So never wait on a fence for a batch that was just issued. The same thing goes for memory: you should never desire to write to memory which is being read by a GPU operation you just submitted.
您至少应该对事物进行双重缓冲.如果您每帧都更改顶点数据,则应分配足够的内存来保存该数据的两个副本.不需要进行多次分配,甚至不需要多次分配VkBuffer
(只需将分配和缓冲区变大,然后在绑定时选择要使用的存储区域).当 GPU 命令读取存储区域的一个区域时,您将写入另一个区域.
You should at least double-buffer things. If you are changing vertex data every frame, you should allocate sufficient memory to hold two copies of that data. There's no need to make multiple allocations, or even to make multiple VkBuffer
s (just make the allocation and buffers bigger, then select which region of storage to use when you're binding it). While one region of storage is being read by GPU commands, you write to the other.
您提交的每个批次都从特定内存中读取.因此,当 GPU 完成从该内存读取时,将设置该批次的栅栏.因此,如果您想从 CPU 写入内存,则在表示该内存读取的 GPU 读取操作的栅栏设置好之前,您无法开始该过程.
Each batch you submit reads from certain memory. As such, the fence for that batch will be set when the GPU is finished reading from that memory. So if you want to write to the memory from the CPU, you cannot begin that process until the fence representing the GPU reading operation for that memory reading gets set.
但是因为您像这样进行双缓冲,您将要写入的内存的栅栏不是您上一帧提交的批处理的栅栏.这是您在之前提交框架的批次.由于 GPU 收到该操作已经有一段时间了,因此 CPU 实际等待的可能性要小得多.也就是说,围栏应该已经设置好了.
But because you're double buffering like this, the fence for the memory you're about to write to is not the fence for the batch you submitted last frame. It's the batch you submitted the frame before that. Since it's been some time since the GPU received that operation, it is far less likely that the CPU will have to actually wait. That is, the fence should hopefully already be set.
现在,您不应该在该围栏上执行文字 vkWaitForFences
.您应该检查它是否已设置,如果没有,请花时间做其他有用的事情.但是,如果您没有其他有用的事情可以做,那么等待可能是可以的(而不是坐以待毙).
Now, you shouldn't do a literal vkWaitForFences
on that fence. You should check to see if it is set, and if it isn't, go do something else useful with your time. But if you have nothing else useful you could be doing, then waiting is probably OK (rather than sitting and spinning on a test).
一旦设置了栅栏,您就知道可以自由地写入内存了.
Once the fence is set, you know that you can freely write to the memory.
我怎么知道我用 memcpy 写入的内存在被渲染通道读取之前已完成发送到设备?
你知道,因为记忆是连贯的.这就是 VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
意味着在这种情况下:主机对设备内存的更改对 GPU 可见,无需显式可见性操作,反之亦然.
You know because the memory is coherent. That is what VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
means in this context: host changes to device memory are visible to the GPU without needing explicit visibility operations, and vice-versa.
嗯……差不多了.
如果您想避免使用任何同步,您必须在完成 CPU 内存修改后调用 vkQueueSubmit
进行读取批处理.如果它们以错误的顺序被调用,那么您将需要一个内存屏障.例如,您可以让批处理的某些部分等待主机设置的事件(通过 vkSetEvent
),它会在您完成写入时通知 GPU.因此,您可以在执行内存写入之前提交该批次.但是在这种情况下,vkCmdWaitEvents
调用应该包括 HOST
的源阶段掩码(因为谁在设置事件),并且它应该有一个内存屏障,其源访问标志还包括 HOST_WRITE
(因为这是写入内存的人).
If you want to avoid having to use any synchronization, you must call vkQueueSubmit
for the reading batch after you have finished modifying the memory on the CPU. If they get called in the wrong order, then you'll need a memory barrier. For example, you could have some part of the batch wait on an event set by the host (through vkSetEvent
), which tells the GPU when you've finished writing. And therefore, you could submit that batch before performing the memory writing. But in this case, the vkCmdWaitEvents
call should include a source stage mask of HOST
(since that's who's setting the event), and it should have a memory barrier whose source access flag also includes HOST_WRITE
(since that's who's writing to the memory).
但在大多数情况下,在提交批处理之前只写入内存更容易.这样,您就无需使用主机/事件同步.
But in most cases, it's easier to just write to the memory before submitting the batch. That way, you avoid needing to use host/event synchronization.
这篇关于在 vulkan 中同步顶点缓冲区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!