本文介绍了不支持 SSSE3 指令的最新处理器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否还有不支持 SSSE3 指令的相关 CPU(Intel/AMD/Atom)?

Are there any still-relevant CPUs (Intel/AMD/Atom) which don't support SSSE3 instructions?

没有 SSSE3 的最新 CPU 是什么?

What's the most recent CPU without SSSE3?

推荐答案

没有 SSSE3 的最新 CPU 基于 AMD K10 微架构:

  • AMD Phenom II,最后一代 K10 插槽Bulldozer 系列之前的台式机 CPU.它们生产于 2008 年至 2012 年.
  • AMD Llano APU,于 2011 年 6 月推出.(基于推土机的 APU 于 2012 年 10 月推出,IDK 是在制造/销售最后一批 Llano APU 时推出的).同样基于 K10 内核,但报告 CPUID系列"= 12h.
  • AMD Phenom II, the last-generation K10 socketed desktop CPUs before Bulldozer-family. They were produced from 2008 to 2012.
  • AMD Llano APUs, introduced June 2011. (Bulldozer-based APUs were introduced Oct 2012, IDK when the last Llano APUs were made / sold). Also based on K10 cores, but reporting CPUID "family" = 12h.

K10 CPU 支持 SSE3(FP 指令,如 movdduphaddps),以及仅支持 AMD 的 SSE4a.一些早期的 K8 内核只有 SSE2,但后来的 K8 也有 SSE3.

K10 CPUs support SSE3 (FP instructions like movddup and haddps), and AMD-only SSE4a. Some early K8 cores only have SSE2, but later K8 also had SSE3.

请注意,https://en.wikipedia.org/wiki/SSSE3 中列出的 AMD CPU#CPUs_with_SSSE3 仅从推土机开始,但包括 AMD 的低功耗山猫/捷豹 CPU.

Notice that AMD CPUs listed in https://en.wikipedia.org/wiki/SSSE3#CPUs_with_SSSE3 only start at Bulldozer, but do include AMD's low-power Bobcat / Jaguar CPUs.

如果您在 google AMD Phenom II ssse3 上搜索,您会找到一些关于某些游戏的页面,这些游戏取消了 SSSE3 要求,以便它们可以在 Phenom II 上运行.

If you google AMD Phenom II ssse3, you'll find some pages about some games removing an SSSE3 requirement so they can work on Phenom II.

在 Intel 上,您必须回到 Pentium M/Core,因为 SSSE3 是在 Core 2 中引入的.(第一代 core2(Conroe/Merom)只有 64 位宽的 shuffle 执行单元,所以 pshufb 相对较慢.但 SSE2 pshufd 也是如此.请参阅 在 x86 上进行水平浮点矢量求和的最快方法.)

On Intel you have to go back as far as Pentium M / Core, because SSSE3 was introduced with Core 2. (First-gen core2 (Conroe/Merom) only has 64-bit wide shuffle execution units, so pshufb is relatively slow. But so is SSE2 pshufd. See Fastest way to do horizontal float vector sum on x86.)

我认为即使是第一代 Atom 也有 SSSE3.https://en.wikipedia.org/wiki/Intel_Atom.

I think even first-gen Atom has SSSE3. https://en.wikipedia.org/wiki/Intel_Atom.

有些像 AMD Geode 这样的 CPU 没有完全没有 SSE, 但我认为问题的重点是确实具有 SSE2/3 但没有 SSSE3 的 CPU.

There are CPUs like AMD Geode that don't have SSE at all, but I think the point of the question is CPUs that do have SSE2/3 but not SSSE3.

没有新的主流 CPU 没有 SSE4.2,但一些 Phenom II CPU 可能在 2018 年仍在使用.它们越老,越预计新软件可能无法运行

There are no new mainstream CPUs being made that don't have SSE4.2, but some Phenom II CPUs are probably still in use even in 2018. The older they are, the more it's expected that new software might not work on them.

不幸的是,仍然有全新的主流 CPU 没有 AVX 和 BMI:英特尔的奔腾和赛扬型号,即使是 Skylake/Kaby Lake.大概是当芯片在其矢量 ALU 的高 128 位中有缺陷时,例如大型 FMA 单元,他们将其融合并禁用 VEX 前缀的解码,并将其标记为 Pentium 或 Celeron.(这大概就是 Pentium/Celeron 型号也不支持 BMI1/BMI2 的原因;除了 pext/pdep 之外,它们占用的芯片面积很小.)

There are unfortunately still brand-new mainstream CPUs being made without AVX and BMI: Intel's Pentium and Celeron models, even for Skylake / Kaby Lake. Presumably when a die has defects in the upper 128-bits of its vector ALUs, e.g. the large FMA units, they fuse it off and disable decoding of VEX prefixes, and label it as a Pentium or Celeron. (This is presumably why Pentium/Celeron models don't support BMI1/BMI2 either; other than pext/pdep those take trivial die area.)

因此,我们不会在未来的某个时刻更接近 BMI1/BMI2 的基线,这真的很不幸,因为它是 Intel CPU 上的单 uop 可变计数移位所必需的.(shl cl,reg 是 3 uop,因为 cl=0 无标志更新的情况是可能的;SHLX/SHRX 是 1 uop).BMI1/2 在整个代码中使用时最有用,而不仅仅是在几个函数中使用.

So we're not getting any closer to BMI1/BMI2 being baseline at some point in the future, which is really unfortunate because it's required for single-uop variable-count shifts on Intel CPUs. (shl cl,reg is 3 uops because of the cl=0 no-flag-update case being possible; SHLX / SHRX are 1 uop). BMI1/2 is most useful when used throughout your whole code, not just in a couple functions.

脚注 1:当然,一些功能齐全的芯片也能得到这种处理,尤其是在新工艺的良率提高后,但在一致性/市场细分方面,它们仍然存在缺陷.

Footnote 1: Certainly some fully-working chips get this treatment, too, especially once yields improve for a new process, but for consistency / market-segmentation they're still crippled.

但我认为 rep movs/rep stos ERMSB 仍然适用于 256 位加载/存储,因此 FP 寄存器文件、加载/存储单元和旁路转发网络仍然需要支持全宽.(而且 ERMSB 比矢量循环更具吸引力,因为它可以使用两倍的宽度.

But I think rep movs/rep stos ERMSB still work with 256-bit loads/stores, so the FP register file, load/store units, and bypass forwarding network would all still need to support full width. (And ERMSB becomes much more attractive vs. vector loops because it can use twice the width.

我想知道是否有办法用保险丝重新连接 CPU,以便它可以使用工作的 FMA 单元的 4 个 128 位通道中的任意 2 个.我们知道 Skylake-AVX512 可以混合和匹配带有端口 0、1 和 5 的 FMA 单元,仅为 512 位向量启动 p5 FMA(如果可用),并将 p0 和 p1 上的 256 位 FMA 单元组合为一个512 位 FMA 单元.静态地使用保险丝做类似的事情可以让英特尔使用存在缺陷的芯片,这些芯片会影响原本是一个 FMA 单元的两个通道.

I wonder if there's a way for the CPU to be rewired with fuses so it can use any 2 of the 4 128-bit lanes of FMA units that are working. We know Skylake-AVX512 can mix and match FMA units with ports 0, 1, and 5, only powering up the p5 FMA (if available) for 512-bit vectors, and combining the 256-bit FMA units on p0 and p1 as one 512-bit FMA unit. Statically doing something like that with fuses could let Intel use chips that had a defect affecting both lanes of what would have been one FMA unit.

无论如何,这纯粹是猜测.很有可能,但不知道我们是否有任何可靠的消息来源表明英特尔实际上曾经这样做是为了销售具有 FMA 缺陷的芯片.我们确实知道,在整个物理内核中存在缺陷的芯片会以较低内核数的 SKU 出售,就像来自四核芯片的双核芯片一样.而四核 i5 CPU 的 L3 缓存只有 6MB 而不是 8MB,这意味着它们的 4 个 L3 缓存切片中的一个被禁用,这可能也是为了挽救缺陷.

Anyway, this is pure guesswork. It's likely, but don't know if we have any reliable source that Intel actually ever did this as a way to sell chips with FMA defects. We do know that chips with defects in a whole physical core get sold as lower core-count SKUs, like a dual-core chip from a quad-core die. And that quad-core i5 CPUs with only 6MB of L3 cache instead of 8MB means they have one of their 4 slices of L3 cache disabled, again probably for salvaging defects.

这篇关于不支持 SSSE3 指令的最新处理器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-31 00:15