问题描述
在CUDA中,流0与其他流有何关系?流0(默认流)是否在上下文中与其他流并发执行?
In CUDA, how is stream 0 related to other streams? Does stream 0 (default stream) execute concurrently with other streams in a context or not?
考虑以下示例:
cudaMemcpy(Dst, Src, sizeof(float)*datasize, cudaMemcpyHostToDevice);//stream 0;
cudaStream_t stream1;
/...creating stream1.../
somekernel<<<blocks, threads, 0, stream1>>>(Dst);//stream 1;
在上面的代码中,编译器可以确保 somekernel
始终启动 AFTER cudaMemcpy
完成还是执行 somekernel
同时与 cudaMemcpy
?
In the above code, can the compiler ensure somekernel
always launches AFTER cudaMemcpy
finishes or will somekernel
execuate concurrently with cudaMemcpy
?
推荐答案
cudaMemcpy
调用是同步调用(在特定情况下除外).运行该代码的主机线程将阻塞,直到内存传输到主机为止.在 cudaMemcpy
调用返回之前,它无法继续启动内核,直到复制操作完成后,内核才会启动.
cudaMemcpy
call is (in all but a particular case) a synchronous call. The host thread running that code blocks until the memory transfer to the host. It cannot proceed to launch the kernel until the cudaMemcpy
call has returned, it that doesn't happen until the copy operation is completed.
更一般地说,每当某个操作在该流中处于活动状态时,默认流(0或null)就会隐式序列化GPU上的操作.如果您在默认流中执行操作的同时创建流并将操作推送到其中,则这些流中的所有并发性都会丢失,直到默认流处于空闲状态为止.
More generally, the default stream (0 or null) implicitly serializes operations on the GPU whenever an operation is active in that stream. If you create streams and push operations into them at the same time as an operation is being performed in default stream, all concurrency in those streams is lost until the default stream is idle.
这篇关于流0(默认)和其他流的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!