问题描述
Hyper Threading是否允许使用L1缓存在两个线程之间交换数据,这两个线程在单个物理核心上同时在两个虚拟核心中同时执行?
Does the Hyper Threading allow to use of L1-cache to exchange the data between the two threads, which are executed simultaneously on a single physical core, but in two virtual cores?
条件是两者都属于同一进程,即在同一地址空间中.
With the proviso that both belong to the same process, i.e. in the same address space.
第85页(2-55)-英特尔®64和IA-32体系结构优化参考手册: http://www.intel.com/content/dam/www/public/我们/zh/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
Page 85 (2-55) - Intel® 64 and IA-32 Architectures Optimization Reference Manual: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
...
更深的缓冲和增强的资源共享/分区策略:
Deeper buffering and enhanced resource sharing/partition policies:
-
用于HT操作的复制资源:寄存器状态,重命名的返回堆栈缓冲区,大页ITLB.
Replicated resource for HT operation: register state, renamed return stack buffer, large-page ITLB.
用于HT操作的分区资源:在两个逻辑处理器之间静态分配了加载缓冲区,存储缓冲区,重排序缓冲区,小页面ITLB.
Partitioned resources for HT operation: load buffers, store buffers, re-order buffers, small-page ITLB are statically allocated between two logical processors.
在HT操作期间竞争共享的资源:预留站,高速缓存层次结构,填充缓冲区,DTLB0和STLB.
Competitively-shared resource during HT operation: the reservation station, cache hierarchy, fill buffers, both DTLB0 and STLB.
在HT操作期间交替进行:前端操作通常在两个逻辑处理器之间交替以确保公平性.
Alternating during HT operation: front end operation generally alternates between two logical processors to ensure fairness.
HT不了解资源:执行单元.
HT unaware resources: execution units.
推荐答案
第2.3.9章中的英特尔体系结构软件优化"手册简要介绍了如何在内核的HT线程之间共享处理器资源.针对Nehalem体系结构进行了记录,该体系结构已经过时,但由于分区在逻辑上是一致的,因此仍很可能与当前体系结构相关:
The Intel Architecture Software Optimization manual has a brief description of how processor resources are shared between HT threads on a core in chapter 2.3.9. Documented for the Nehalem architecture, getting stale but fairly likely to still be relevant for current ones since the partitioning is logically consistent:
-
每个HT线程都重复:寄存器,返回堆栈缓冲区,大页面ITLB
Duplicated for each HT thread: the registers, the return stack buffer, the large-page ITLB
静态分配给每个HT线程:加载,存储和重新排序缓冲区,小页面ITLB
Statically allocated for each HT thread: the load, store and re-order buffers, the small-page ITLB
在HT线程之间竞争地共享:预留站,缓存,填充缓冲区,DTLB0和STLB.
Competitively shared between HT threads: the reservation station, the caches, the fill buffers, DTLB0 and STLB.
您的问题与第三个项目符号匹配.在每个HT线程在同一进程中执行代码的非常特殊的情况下,这有点意外,通常可以期望L1和L2包含一个HT线程检索的数据,这对另一个线程很有用.请记住,缓存中的存储单位是一个缓存行,为64字节.以防万一:这不是采用线程调度方法的好理由,该方法支持让两个HT线程在同一内核上执行,前提是您的OS支持. HT线程的运行速度通常比将核心获取自身的线程慢很多.通常约有30%是YMMV.
Your question matches the 3rd bullet. In the very specific case of each HT thread executing code from the same process, a bit of an accident, you can generally expect L1 and L2 to contain data retrieved by one HT thread that can be useful to the other. Keep in mind that the unit of storage in the caches is a cache-line, 64 bytes. Just in case: this is not otherwise a good reason to pursue a thread-scheduling approach that favors getting two HT threads to execute on the same core, assuming your OS would support that. An HT thread generally runs quite a bit slower than a thread that gets the core to itself. 30% is the usual number bandied about, YMMV.
这篇关于使用超线程,一个物理核心的线程通过什么级别的缓存L1/L2/L3进行交换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!