




The primary idea behind HT/SMT was that when one thread stalls, another thread on the same core can co-opt the rest of that core's idle time and run with it, transparently.

ARM不再支持SMT(出于能源原因). AMD从未支持过它.在野外,我们仍然有支持它的各种处理器.

ARM no longer support SMT (for energy reasons). AMD never supported it. In the wild, we still have various processors that support it.


From my perspective, if data and algorithms are built to avoid cache misses and subsequent processing stalls at all costs, surely HT is a redundant factor in multi-core systems? While I appreciate that there is low overhead to the context-switching involved since the two HyperThreads' discrete hardware exists within the same physical core, I cannot see that this is better than no context switching at all.


I'm suggesting that any need for HyperThreading points to flawed software design. Is there anything I am missing here?



Whether hyper-threading helps and by how much very much depends on what the threads are doing. It isn't just about doing work in one thread while the other thread waits on I/O or a cache miss - although that is a big part of the rationale. It is about efficiently using the CPU resources to increase total system throughput. Suppose you have two threads

  1. 一个人有很多数据缓存未命中(空间局部性差)并且不使用浮点数,较差的空间局部性不一定是因为程序员做得不好,某些工作负载是固有的.
  2. 另一个线程正在从内存中传输数据并进行浮点计算


With hyper-threading these two threads can share the same CPU, one is doing integer operations and getting cache misses and stalling, the other is using the floating point unit and the data prefetcher is well ahead anticipating the sequential data from memory. The system throughput is better than if the O/S alternatively scheduled both threads on the same CPU core.


Intel chose not to include hyper-threading in Silvermont, but that doesn't mean it will do away with it in high end Xeon server processors, or even in processors targeted at laptops. Choosing the micro-architecture for a processor involves trade-offs, there are many considerations:

  1. 目标市场是什么(将运行哪种应用程序)?
  2. 目标晶体管技术是什么?
  3. 绩效目标是什么?
  4. 什么是电力预算?
  5. 目标裸片尺寸是多少(影响产量)?
  6. 公司的未来产品在价格/性能方面适合什么地方?
  7. 目标发布日期是什么?
  8. 有多少资源可用于实施和验证设计?添加微体系结构功能会增加非线性的复杂性,与其他功能之间存在细微的交互作用,目标是在第一个磁带"之前识别出尽可能多的错误,以最大程度地减少必须执行的步骤"工作芯片.

Silvermont的每个内核的芯片尺寸预算和功率预算排除了乱序执行和超线程的情况,而乱序执行可提供更好的单线程性能. 这里是Anandtech的评估 :

Silvermont's die size budget per core and power budget precluded having both out-of-order execution and hyperthreading, and out-of-order execution gives better single threaded performance. Here's Anandtech's assessment:

以前的Atom版本使用Hyper Threading来充分利用执行资源.超线程具有与之相关的功率损失,但是性能提升足以证明其合理性.在22纳米处,英特尔具有足够的裸片面积(由于晶体管缩放),因此仅添加更多内核即可,而不必依靠HT来获得更好的线程性能,因此超线程技术得以淘汰.然后,英特尔将摆脱超线程技术所节省的能源分配给Silvermont进行无序设计,这反过来又有助于在不使用HT的情况下提高对执行资源的有效利用.事实证明,在22nm处,英特尔用于启用HT的芯片面积与Silvermont的重新排序缓冲区和OoO逻辑大致相同,因此,此举甚至没有面积损失.

Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.


08-06 22:34