本文介绍了使用较新的SIMD版本时是否可用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我可以使用SSE3或AVX时,可以使用较旧的SSE版本作为SSE2或MMX-
还是我仍然需要单独检查它们?

When I can use SSE3 or AVX, are then older SSE versions as SSE2 or MMX available -
or do I still need to check for them separately?

推荐答案

通常,它们是可加的,但请记住,多年来,英特尔和AMD在这些方面的支持存在差异.

In general, these have been additive but keep in mind that there are differences between Intel and AMD support for these over the years.

如果您具有AVX,则也可以假定SSE,SSE2,SSE3,SSSE3,SSE4.1和SSE 4.2.请记住,要使用AVX,还需要验证OSXSAVE CPUID位置1,以确保所使用的OS实际上也支持保存AVX寄存器.

If you have AVX, then you can assume SSE, SSE2, SSE3, SSSE3, SSE4.1, and SSE 4.2 as well. Remember that to use AVX you also need to validate the OSXSAVE CPUID bit is set to ensure the OS you are using actually supports saving the AVX registers as well.

您仍然应该显式检查代码中使用的所有CPUID支持的健壮性(例如,同时检查AVX,OSXSAVE,SSE4,SSE3,SSSE3,以保护您的AVX代码路径).

You should still explicitly check for all the CPUID support you use in your code for robustness (say checking for AVX, OSXSAVE, SSE4, SSE3, SSSE3 at the same time to guard your AVX codepaths).

#include <intrin.h>

inline bool IsAVXSupported()
{
#if defined(_M_IX86 ) || defined(_M_X64)
   int CPUInfo[4] = {-1};
   __cpuid( CPUInfo, 0 );

   if ( CPUInfo[0] < 1  )
       return false;

    __cpuid(CPUInfo, 1 );

    int ecx = 0x10000000 // AVX
              | 0x8000000 // OSXSAVE
              | 0x100000 // SSE 4.2
              | 0x80000 // SSE 4.1
              | 0x200 // SSSE3
              | 0x1; // SSE3

    if ( ( CPUInfo[2] & ecx ) != ecx )
        return false;

    return true;
#else
    return false;
#endif
}

所有具有x64本机功能的处理器都需要SSE和SSE2,因此它们是所有代码的良好基准假设. Windows 8.0,Windows 8.1和Windows 10即使对于x86体系结构也明确要求SSE和SSE2支持,因此这些指令集非常普遍.换句话说,如果您未能通过SSE或SSE2检查,只需退出致命错误的应用即可.

SSE and SSE2 are required for all processors capable of x64 native, so they are good baseline assumptions for all code. Windows 8.0, Windows 8.1, and Windows 10 explicitly require SSE and SSE2 support even for x86 architectures so those instruction sets are pretty ubiquitous. In other words, if you fail a check for SSE or SSE2, just exit the app with a fatal error.

#include <windows.h>

inline bool IsSSESupported()
{
#if defined(_M_IX86 ) || defined(_M_X64)
   return ( IsProcessorFeaturePresent( PF_XMMI_INSTRUCTIONS_AVAILABLE ) != 0 && IsProcessorFeaturePresent( PF_XMMI64_INSTRUCTIONS_AVAILABLE ) != 0 );
#else
    return false;
#endif
}

-或-

#include <intrin.h>

inline bool IsSSESupported()
{
#if defined(_M_IX86 ) || defined(_M_X64)
   int CPUInfo[4] = {-1};
   __cpuid( CPUInfo, 0 );

   if ( CPUInfo[0] < 1  )
       return false;

    __cpuid(CPUInfo, 1 );

    int edx = 0x4000000 // SSE2
              | 0x2000000; // SSE

    if ( ( CPUInfo[3] & edx ) != edx )
        return false;

    return true;
#else
    return false;
#endif
}

此外,请记住,MMX,x87 FPU和 AMD 3DNow! *都是x64本机不推荐使用的指令集,因此您不应再在较新的代码中积极使用它们.一个好的经验法则是避免使用任何返回__m64或采用__m64数据类型的内在函数.

Also, keep in mind that MMX, x87 FPU, and AMD 3DNow!* are all deprecated instruction sets for x64 native, so you shouldn't be using them actively anymore in newer code. A good rule of thumb is to avoid using any intrinsic that returns a __m64 or takes a __m64 data type.

您可能想看看 DirectXMath博客系列关于这些指令集和相关处理器支持要求的注释.

You may want to check out this DirectXMath blog series with notes on many of these instruction sets and the relevant processor support requirements.

注意(*)-所有AMD 3DNow!除PREFETCHPREFETCHW(已结转)外,不提倡使用该指令.第一代Intel64处理器不支持这些指令,但后来又添加了它们,因为它们被视为核心X64指令集的一部分. Windows 8.1和Windows 10 x64特别需要PREFETCHW,尽管测试有点奇怪.实际上,大多数Broadwell之前的Intel CPU并没有通过CPUID报告对PREFETCHW的支持,但是它们将操作码视为无操作,而不是抛出非法指令"异常.因此,这里的测试是(a)CPUID是否支持,(b)如果不支持,则PREFETCHW至少不会引发异常.

Note (*) - All the AMD 3DNow! instructions are deprecated except for PREFETCH and PREFETCHW which were carried forward. First generation Intel64 processors lacked support for these instructions, but they were later added as they are considered part of the core X64 instruction set. Windows 8.1 and Windows 10 x64 require PREFETCHW in particular, although the test is a little odd. Most Intel CPUs prior to Broadwell do not in fact report support for PREFETCHW through CPUID, but they treat the opcode as a no-op rather than throw an 'illegal instruction' exception. As such, the test here is (a) is it supported by CPUID, and (b) if not, does PREFETCHW at least not throw an exception.

下面是Visual Studio的一些测试代码,这些代码演示了PREFETCHW测试以及x86和x64平台上的许多其他CPUID位.

Here's some test code for Visual Studio that demonstrates the PREFETCHW test as well as many other CPUID bits for the x86 and x64 platforms.

#include <intrin.h>
#include <stdio.h>
#include <windows.h>
#include <excpt.h>

void main()
{
   unsigned int x = _mm_getcsr();
   printf("%08X\n", x );

   bool prefetchw = false;

   // See http://msdn.microsoft.com/en-us/library/hskdteyh.aspx
   int CPUInfo[4] = {-1};
   __cpuid( CPUInfo, 0 );

   if ( CPUInfo[0] > 0 )
   {
       __cpuid(CPUInfo, 1 );

       // EAX
       {
           int stepping = (CPUInfo[0] & 0xf);
           int basemodel = (CPUInfo[0] >> 4) & 0xf;
           int basefamily = (CPUInfo[0] >> 8) & 0xf;
           int xmodel = (CPUInfo[0] >> 16) & 0xf;
           int xfamily = (CPUInfo[0] >> 20) & 0xff;

           int family = basefamily + xfamily;
           int model = (xmodel << 4) | basemodel;

           printf("Family %02X, Model %02X, Stepping %u\n", family, model, stepping );
       }

       // ECX
       if ( CPUInfo[2] & 0x20000000 ) // bit 29
          printf("F16C\n");

       if ( CPUInfo[2] & 0x10000000 ) // bit 28
          printf("AVX\n");

       if ( CPUInfo[2] & 0x8000000 ) // bit 27
          printf("OSXSAVE\n");

       if ( CPUInfo[2] & 0x400000 ) // bit 22
          printf("MOVBE\n");

       if ( CPUInfo[2] & 0x100000 ) // bit 20
          printf("SSE4.2\n");

       if ( CPUInfo[2] & 0x80000 ) // bit 19
          printf("SSE4.1\n");

       if ( CPUInfo[2] & 0x2000 ) // bit 13
          printf("CMPXCHANG16B\n");

       if ( CPUInfo[2] & 0x1000 ) // bit 12
          printf("FMA3\n");

       if ( CPUInfo[2] & 0x200 ) // bit 9
          printf("SSSE3\n");

       if ( CPUInfo[2] & 0x1 ) // bit 0
          printf("SSE3\n");

       // EDX
       if ( CPUInfo[3] & 0x4000000 ) // bit 26
           printf("SSE2\n");

       if ( CPUInfo[3] & 0x2000000 ) // bit 25
           printf("SSE\n");

       if ( CPUInfo[3] & 0x800000 ) // bit 23
           printf("MMX\n");
   }
   else
       printf("CPU doesn't support Feature Identifiers\n");

   if ( CPUInfo[0] >= 7 )
   {
       __cpuidex(CPUInfo, 7, 0);

       // EBX
       if ( CPUInfo[1] & 0x100 ) // bit 8
         printf("BMI2\n");

       if ( CPUInfo[1] & 0x20 ) // bit 5
         printf("AVX2\n");

       if ( CPUInfo[1] & 0x8 ) // bit 3
         printf("BMI\n");
   }
   else
       printf("CPU doesn't support Structured Extended Feature Flags\n");

   // Extended features
   __cpuid( CPUInfo, 0x80000000 );

   if ( CPUInfo[0] > 0x80000000 )
   {
       __cpuid(CPUInfo, 0x80000001 );

       // ECX
       if ( CPUInfo[2] & 0x10000 ) // bit 16
           printf("FMA4\n");

       if ( CPUInfo[2] & 0x800 ) // bit 11
           printf("XOP\n");

       if ( CPUInfo[2] & 0x100 ) // bit 8
       {
           printf("PREFETCHW\n");
           prefetchw = true;
       }

       if ( CPUInfo[2] & 0x80 ) // bit 7
           printf("Misalign SSE\n");

       if ( CPUInfo[2] & 0x40 ) // bit 6
           printf("SSE4A\n");

       if ( CPUInfo[2] & 0x1 ) // bit 0
           printf("LAHF/SAHF\n");

       // EDX
       if ( CPUInfo[3] & 0x80000000 ) // bit 31
           printf("3DNow!\n");

       if ( CPUInfo[3] & 0x40000000 ) // bit 30
           printf("3DNowExt!\n");

       if ( CPUInfo[3] & 0x20000000 ) // bit 29
           printf("x64\n");

       if ( CPUInfo[3] & 0x100000 ) // bit 20
           printf("NX\n");
   }
   else
       printf("CPU doesn't support Extended Feature Identifiers\n");

   if ( !prefetchw )
   {
       bool illegal = false;

       __try
       {
           static const unsigned int s_data = 0xabcd0123;

           _m_prefetchw(&s_data);
       }
       __except (EXCEPTION_EXECUTE_HANDLER)
       {
           illegal = true;
       }

       if (illegal)
       {
           printf("PREFETCHW is an invalid instruction on this processor\n");
       }
   }
}

更新:当然,基本的挑战是如何处理不支持AVX的系统?虽然指令集很有用,但拥有支持AVX的处理器的最大好处是能够使用/arch:AVX构建开关,该开关可以全局使用 VEX前缀,以获得更好的SSE/SSE2代码生成.唯一的问题是生成的代码DLL/EXE与缺少AVX支持的系统不兼容.

UPDATE: The fundamental challenge, of course, is how do you handle systems that lack support for AVX? While the instruction set is useful, the biggest benefit of having an AVX-capable processor is the ability to use the /arch:AVX build switch which enables the global use of the VEX prefix for better SSE/SSE2 code-gen. The only problem is the resulting code DLL/EXE is not compatible with systems that lack AVX support.

因此,对于Windows,理想情况下,您应该为非AVX系统构建一个EXE(假设仅SSE/SSE2,因此对于x86代码使用/arch:SSE2代替;对于x64代码,此设置是隐式的),而另一种EXE是针对AVX进行了优化(使用/arch:AVX),然后使用CPU检测来确定给定系统要使用哪个EXE.

As such, for Windows, ideally you should build one EXE for non-AVX systems (assuming SSE/SSE2 only so use /arch:SSE2 instead for x86 code; this setting is implicit for x64 code), a different EXE that is optimized for AVX (using /arch:AVX), and then use CPU detection to determine which EXE to use for a given system.

幸运的是,借助Xbox One,我们始终可以使用/arch::AVX进行构建,因为它是固定平台...

Luckily with Xbox One, we can just always build with /arch::AVX since it's a fixed platform...

更新2:对于clang/LLVM,您应该对CPUID使用稍微不同的本征:

UPDATE 2: For clang/LLVM, you should use slight dikyfferent intriniscs for CPUID:

if defined(__clang__) || defined(__GNUC__)
    __cpuid(1, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
    __cpuid(CPUInfo, 1);
#endif
if defined(__clang__) || defined(__GNUC__)
    __cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]);
#else
    __cpuidex(CPUInfo, 7, 0);
#endif

这篇关于使用较新的SIMD版本时是否可用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 17:15