问题描述
当我可以使用SSE3或AVX时,可以使用较旧的SSE版本作为SSE2或MMX-
还是我仍然需要单独检查它们?
When I can use SSE3 or AVX, are then older SSE versions as SSE2 or MMX available -
or do I still need to check for them separately?
推荐答案
通常,它们是可加的,但请记住,多年来,英特尔和AMD在这些方面的支持存在差异.
In general, these have been additive but keep in mind that there are differences between Intel and AMD support for these over the years.
如果您具有AVX,则也可以假定SSE,SSE2,SSE3,SSSE3,SSE4.1和SSE 4.2.请记住,要使用AVX,还需要验证OSXSAVE CPUID位置1,以确保所使用的OS实际上也支持保存AVX寄存器.
If you have AVX, then you can assume SSE, SSE2, SSE3, SSSE3, SSE4.1, and SSE 4.2 as well. Remember that to use AVX you also need to validate the OSXSAVE CPUID bit is set to ensure the OS you are using actually supports saving the AVX registers as well.
您仍然应该显式检查代码中使用的所有CPUID支持的健壮性(例如,同时检查AVX,OSXSAVE,SSE4,SSE3,SSSE3,以保护您的AVX代码路径).
You should still explicitly check for all the CPUID support you use in your code for robustness (say checking for AVX, OSXSAVE, SSE4, SSE3, SSSE3 at the same time to guard your AVX codepaths).
#include <intrin.h>
inline bool IsAVXSupported()
{
#if defined(_M_IX86 ) || defined(_M_X64)
int CPUInfo[4] = {-1};
__cpuid( CPUInfo, 0 );
if ( CPUInfo[0] < 1 )
return false;
__cpuid(CPUInfo, 1 );
int ecx = 0x10000000 // AVX
| 0x8000000 // OSXSAVE
| 0x100000 // SSE 4.2
| 0x80000 // SSE 4.1
| 0x200 // SSSE3
| 0x1; // SSE3
if ( ( CPUInfo[2] & ecx ) != ecx )
return false;
return true;
#else
return false;
#endif
}
所有具有x64本机功能的处理器都需要SSE和SSE2,因此它们是所有代码的良好基准假设. Windows 8.0,Windows 8.1和Windows 10即使对于x86体系结构也明确要求SSE和SSE2支持,因此这些指令集非常普遍.换句话说,如果您未能通过SSE或SSE2检查,只需退出致命错误的应用即可.
SSE and SSE2 are required for all processors capable of x64 native, so they are good baseline assumptions for all code. Windows 8.0, Windows 8.1, and Windows 10 explicitly require SSE and SSE2 support even for x86 architectures so those instruction sets are pretty ubiquitous. In other words, if you fail a check for SSE or SSE2, just exit the app with a fatal error.
#include <windows.h>
inline bool IsSSESupported()
{
#if defined(_M_IX86 ) || defined(_M_X64)
return ( IsProcessorFeaturePresent( PF_XMMI_INSTRUCTIONS_AVAILABLE ) != 0 && IsProcessorFeaturePresent( PF_XMMI64_INSTRUCTIONS_AVAILABLE ) != 0 );
#else
return false;
#endif
}
-或-
#include <intrin.h>
inline bool IsSSESupported()
{
#if defined(_M_IX86 ) || defined(_M_X64)
int CPUInfo[4] = {-1};
__cpuid( CPUInfo, 0 );
if ( CPUInfo[0] < 1 )
return false;
__cpuid(CPUInfo, 1 );
int edx = 0x4000000 // SSE2
| 0x2000000; // SSE
if ( ( CPUInfo[3] & edx ) != edx )
return false;
return true;
#else
return false;
#endif
}
此外,请记住,MMX,x87 FPU和 AMD 3DNow! *都是x64本机不推荐使用的指令集,因此您不应再在较新的代码中积极使用它们.一个好的经验法则是避免使用任何返回__m64
或采用__m64
数据类型的内在函数.
Also, keep in mind that MMX, x87 FPU, and AMD 3DNow!* are all deprecated instruction sets for x64 native, so you shouldn't be using them actively anymore in newer code. A good rule of thumb is to avoid using any intrinsic that returns a __m64
or takes a __m64
data type.
您可能想看看 DirectXMath博客系列关于这些指令集和相关处理器支持要求的注释.
You may want to check out this DirectXMath blog series with notes on many of these instruction sets and the relevant processor support requirements.
注意(*)-所有AMD 3DNow!除PREFETCH
和PREFETCHW
(已结转)外,不提倡使用该指令.第一代Intel64处理器不支持这些指令,但后来又添加了它们,因为它们被视为核心X64指令集的一部分. Windows 8.1和Windows 10 x64特别需要PREFETCHW
,尽管测试有点奇怪.实际上,大多数Broadwell之前的Intel CPU并没有通过CPUID报告对PREFETCHW
的支持,但是它们将操作码视为无操作,而不是抛出非法指令"异常.因此,这里的测试是(a)CPUID是否支持,(b)如果不支持,则PREFETCHW
至少不会引发异常.
Note (*) - All the AMD 3DNow! instructions are deprecated except for PREFETCH
and PREFETCHW
which were carried forward. First generation Intel64 processors lacked support for these instructions, but they were later added as they are considered part of the core X64 instruction set. Windows 8.1 and Windows 10 x64 require PREFETCHW
in particular, although the test is a little odd. Most Intel CPUs prior to Broadwell do not in fact report support for PREFETCHW
through CPUID, but they treat the opcode as a no-op rather than throw an 'illegal instruction' exception. As such, the test here is (a) is it supported by CPUID, and (b) if not, does PREFETCHW
at least not throw an exception.
下面是Visual Studio的一些测试代码,这些代码演示了PREFETCHW
测试以及x86和x64平台上的许多其他CPUID位.
Here's some test code for Visual Studio that demonstrates the PREFETCHW
test as well as many other CPUID bits for the x86 and x64 platforms.
#include <intrin.h>
#include <stdio.h>
#include <windows.h>
#include <excpt.h>
void main()
{
unsigned int x = _mm_getcsr();
printf("%08X\n", x );
bool prefetchw = false;
// See http://msdn.microsoft.com/en-us/library/hskdteyh.aspx
int CPUInfo[4] = {-1};
__cpuid( CPUInfo, 0 );
if ( CPUInfo[0] > 0 )
{
__cpuid(CPUInfo, 1 );
// EAX
{
int stepping = (CPUInfo[0] & 0xf);
int basemodel = (CPUInfo[0] >> 4) & 0xf;
int basefamily = (CPUInfo[0] >> 8) & 0xf;
int xmodel = (CPUInfo[0] >> 16) & 0xf;
int xfamily = (CPUInfo[0] >> 20) & 0xff;
int family = basefamily + xfamily;
int model = (xmodel << 4) | basemodel;
printf("Family %02X, Model %02X, Stepping %u\n", family, model, stepping );
}
// ECX
if ( CPUInfo[2] & 0x20000000 ) // bit 29
printf("F16C\n");
if ( CPUInfo[2] & 0x10000000 ) // bit 28
printf("AVX\n");
if ( CPUInfo[2] & 0x8000000 ) // bit 27
printf("OSXSAVE\n");
if ( CPUInfo[2] & 0x400000 ) // bit 22
printf("MOVBE\n");
if ( CPUInfo[2] & 0x100000 ) // bit 20
printf("SSE4.2\n");
if ( CPUInfo[2] & 0x80000 ) // bit 19
printf("SSE4.1\n");
if ( CPUInfo[2] & 0x2000 ) // bit 13
printf("CMPXCHANG16B\n");
if ( CPUInfo[2] & 0x1000 ) // bit 12
printf("FMA3\n");
if ( CPUInfo[2] & 0x200 ) // bit 9
printf("SSSE3\n");
if ( CPUInfo[2] & 0x1 ) // bit 0
printf("SSE3\n");
// EDX
if ( CPUInfo[3] & 0x4000000 ) // bit 26
printf("SSE2\n");
if ( CPUInfo[3] & 0x2000000 ) // bit 25
printf("SSE\n");
if ( CPUInfo[3] & 0x800000 ) // bit 23
printf("MMX\n");
}
else
printf("CPU doesn't support Feature Identifiers\n");
if ( CPUInfo[0] >= 7 )
{
__cpuidex(CPUInfo, 7, 0);
// EBX
if ( CPUInfo[1] & 0x100 ) // bit 8
printf("BMI2\n");
if ( CPUInfo[1] & 0x20 ) // bit 5
printf("AVX2\n");
if ( CPUInfo[1] & 0x8 ) // bit 3
printf("BMI\n");
}
else
printf("CPU doesn't support Structured Extended Feature Flags\n");
// Extended features
__cpuid( CPUInfo, 0x80000000 );
if ( CPUInfo[0] > 0x80000000 )
{
__cpuid(CPUInfo, 0x80000001 );
// ECX
if ( CPUInfo[2] & 0x10000 ) // bit 16
printf("FMA4\n");
if ( CPUInfo[2] & 0x800 ) // bit 11
printf("XOP\n");
if ( CPUInfo[2] & 0x100 ) // bit 8
{
printf("PREFETCHW\n");
prefetchw = true;
}
if ( CPUInfo[2] & 0x80 ) // bit 7
printf("Misalign SSE\n");
if ( CPUInfo[2] & 0x40 ) // bit 6
printf("SSE4A\n");
if ( CPUInfo[2] & 0x1 ) // bit 0
printf("LAHF/SAHF\n");
// EDX
if ( CPUInfo[3] & 0x80000000 ) // bit 31
printf("3DNow!\n");
if ( CPUInfo[3] & 0x40000000 ) // bit 30
printf("3DNowExt!\n");
if ( CPUInfo[3] & 0x20000000 ) // bit 29
printf("x64\n");
if ( CPUInfo[3] & 0x100000 ) // bit 20
printf("NX\n");
}
else
printf("CPU doesn't support Extended Feature Identifiers\n");
if ( !prefetchw )
{
bool illegal = false;
__try
{
static const unsigned int s_data = 0xabcd0123;
_m_prefetchw(&s_data);
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
illegal = true;
}
if (illegal)
{
printf("PREFETCHW is an invalid instruction on this processor\n");
}
}
}
更新:当然,基本的挑战是如何处理不支持AVX的系统?虽然指令集很有用,但拥有支持AVX的处理器的最大好处是能够使用/arch:AVX
构建开关,该开关可以全局使用 VEX前缀,以获得更好的SSE/SSE2代码生成.唯一的问题是生成的代码DLL/EXE与缺少AVX支持的系统不兼容.
UPDATE: The fundamental challenge, of course, is how do you handle systems that lack support for AVX? While the instruction set is useful, the biggest benefit of having an AVX-capable processor is the ability to use the /arch:AVX
build switch which enables the global use of the VEX prefix for better SSE/SSE2 code-gen. The only problem is the resulting code DLL/EXE is not compatible with systems that lack AVX support.
因此,对于Windows,理想情况下,您应该为非AVX系统构建一个EXE(假设仅SSE/SSE2,因此对于x86代码使用/arch:SSE2
代替;对于x64代码,此设置是隐式的),而另一种EXE是针对AVX进行了优化(使用/arch:AVX
),然后使用CPU检测来确定给定系统要使用哪个EXE.
As such, for Windows, ideally you should build one EXE for non-AVX systems (assuming SSE/SSE2 only so use /arch:SSE2
instead for x86 code; this setting is implicit for x64 code), a different EXE that is optimized for AVX (using /arch:AVX
), and then use CPU detection to determine which EXE to use for a given system.
幸运的是,借助Xbox One,我们始终可以使用/arch::AVX
进行构建,因为它是固定平台...
Luckily with Xbox One, we can just always build with /arch::AVX
since it's a fixed platform...
更新2:对于clang/LLVM,您应该对CPUID
使用稍微不同的本征:
UPDATE 2: For clang/LLVM, you should use slight dikyfferent intriniscs for CPUID
:
if defined(__clang__) || defined(__GNUC__)
__cpuid(1, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
__cpuid(CPUInfo, 1);
#endif
if defined(__clang__) || defined(__GNUC__)
__cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
__cpuidex(CPUInfo, 7, 0);
#endif
这篇关于使用较新的SIMD版本时是否可用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!