本文介绍了cublas如何实现异步标量变量传输的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在许多cublas或cusparse函数调用中,它们使用标量变量,我们可以在主机指针或设备指针中传递,例如alpha和beta变量

in many cublas or cusparse function calls, they use scalar variables which we can pass in either host pointer or device pointer, such as the alpha and beta variable herehttp://docs.nvidia.com/cuda/cublas/#cublas-lt-t-gt-gemm

这是如何实现的?如果数据在主机中,我认为需要在设备上分配内存,然后调用cudaMemcpyAsync来复制数据。但是,做cudaMalloc会使函数调用同步。我们如何解决这个问题?

How is this actually implemented? If the data is in host, I assume it would need to allocate memory on device and then call cudaMemcpyAsync to copy the data. However, doing cudaMalloc would make the function call synchronous. How can we solve this problem?

推荐答案

如果它的主机常驻标量,它可以作为内核参数传递值。如果它的设备驻留,那么指向它的指针可以作为内核参数传递。

If its a host resident scalar it can be passed by value as a kernel parameter. If it's device resident then a pointer to it can be passed as a kernel parameter.

这篇关于cublas如何实现异步标量变量传输的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 12:27