问题描述
在我的CUDA程序中,我发现在通信时间的不同运行(高达50%)之间的巨大变化,包括主机到设备和设备,通过PCI Express为固定存储器托管数据传输时间。我如何解释这种变异性?当PCI控制器和内存控制器忙于执行其他PCIe传输时,会发生这种情况吗?任何洞察/参考是非常感谢。 GPU是特斯拉K20c,主机是AMD皓龙6168与12核运行Linux操作系统。 PCI Express版本为2.0。
On my CUDA program I see large variability between different runs (upto 50%) in communication time which include host to device and device to host data transfer times over PCI Express for pinned memory. How can I explain this variability? Does it happen when the PCI controller and memory controller is busy performing other PCIe transfers? Any insight/reference is greatly appreciated. The GPU is Tesla K20c, the host is AMD Opteron 6168 with 12 cores running the Linux operating system. The PCI Express version is 2.0.
推荐答案
您正在执行此操作的系统是系统,这意味着两个分立的CPU(Opteron 6168在单个软件包中有两个6核CPU)在您的主机有自己的内存控制器,并且在每个CPU内存和托管您的CUDA设备的PCI-e控制器之间可能有不同数量的HyperTransport跳。
The system you are doing this on is a NUMA system, which means that each of the two discrete CPUs (the Opteron 6168 has two 6 core CPUs in a single package) in your host has its own memory controller and there maybe a different number of HyperTransport hops between each CPUs memory and the PCI-e controller hosting your CUDA device.
这意味着,在CPU亲和性上,运行带宽测试的线程可能对主机内存和GPU有不同的延迟。这将解释您看到的时间差异
This means that, depending on CPU affinity, the thread which runs your bandwidth tests may have different latency to both host memory and the GPU. This would explain the differences in timings which you are seeing
这篇关于如何解释PCIe总线的性能变化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!