问题描述
我正在尝试确定一种有效的方法来检测AVX和AVX2在Intel和AMD处理器上的可用性.当我阅读《英特尔软件开发人员手册》 ,第I卷(使用XSAVE功能集管理状态,第310页)时,我惊讶地发现它更接近SSE和XSAVE. ).
I'm trying to determine an efficient method for detecting the availability of AVX and AVX2 on Intel and AMD processors. I was kind of surprised to learn it was closer to SSE and XSAVE when reading the Intel Software Developer Manual, Volume I (MANAGING STATE USING THE XSAVE FEATURE SET, p. 310).
Intel在是否已启用AVX?代码如下所示,并且不太麻烦.问题是,Visual Studio是一个痛点,因为我们需要将代码从C/C ++文件ind移到X64的ASM文件中.
Intel posts some code for detecting AVX availability at Is AVX enabled? The code is shown below and its not too painful. The problem is, Visual Studio is a pain point because we need to move code out of C/C++ files ind into ASM files for X64.
其他人似乎正在采用SIGILL
方法来检测AVX的可用性.或者他们不知不觉中使用了SIGILL
方法.参见例如关于AVX指令的SIGILL .
Others seem to be taking the SIGILL
approach to detecting AVX availability. Or they are unwittingly using the SIGILL
method. See, for example, SIGILL on AVX instruction.
我的问题是,使用SIGILL
方法检测AVX可用性是否安全?在这里,""是指当CPU和OS支持AVX时,AVX指令不会生成SIGILL
.否则将生成SIGILL
.
My question is, is it safe to use the SIGILL
method to detect AVX availability? Here, "safe" means an AVX instruction will not generate a SIGILL
when the CPU and OS supports AVX; and it will generate a SIGILL
otherwise.
以下代码适用于32位计算机,其代码来自英特尔博客是否启用了AVX?让我担心的是操纵控制寄存器.读写某些X86和ARM控制寄存器有时需要超级用户/管理员特权.这是我偏爱SIGILL
(并避免使用控制寄存器)的原因.
The code below is for 32-bit machines and its from the Intel blog Is AVX enabled? The thing that worries me is manipulating the control registers. Reading and writing some X86 and ARM control registers sometimes require super user/administrator privileges. Its the reason I prefer a SIGILL
(and avoid control registers).
; int isAvxSupported();
isAvxSupported proc
xor eax, eax
cpuid
cmp eax, 1 ; does CPUID support eax = 1?
jb not_supported
mov eax, 1
cpuid
and ecx, 018000000h ; check 27 bit (OS uses XSAVE/XRSTOR)
cmp ecx, 018000000h ; and 28 (AVX supported by CPU)
jne not_supported
xor ecx, ecx ; XFEATURE_ENABLED_MASK/XCR0 register number = 0
xgetbv ; XFEATURE_ENABLED_MASK register is in edx:eax
and eax, 110b
cmp eax, 110b ; check the AVX registers restore at context switch
jne not_supported
supported:
mov eax, 1
ret
not_supported:
xor eax, eax
ret
isAvxSupported endp
推荐答案
先讲一点理论.
要使用AVX指令集,必须满足一些条件:
In order to use the AVX instructions set a few conditions must meet:
-
CR4.OSXSAVE[bit 18]
必须为1.
操作系统设置此标志,以向处理器发出信号,通知它支持xsave
扩展.xsave
扩展名是保存AVX状态的唯一方法(fxsave
不保存ymm
寄存器),因此OS必须支持它们.
CR4.OSXSAVE[bit 18]
must be 1.
This flag is set by the OS to signal the processor that it supports thexsave
extensions.
Thexsave
extensions are the only way to save the AVX state (fxsave
doesn't save theymm
registers) and thus the OS must support them.
XCR0.SSE[bit 1]
和XCR0.AVX[bit 2]
必须为1.
这些标志由操作系统设置,以通知处理器它支持保存和还原SSE和AVX状态(通过xsave
).
XCR0.SSE[bit 1]
and XCR0.AVX[bit 2]
must be 1.
These flags are set by the OS to signal the processor that it supports saving and restoring the SSE and AVX states (through xsave
).
CPUID.1:ECX.AVX[bit 28] = 1
当然,处理器首先必须支持AVX扩展.
CPUID.1:ECX.AVX[bit 28] = 1
Of course, the processor must support the AVX extensions in the first place.
所有这些寄存器都是用户模式可读的,但对于CR4
.
幸运的是,CR4.OSXSAVE
位反映在CPUID.1:ECX.OSXSAVE[bit 27]
中,因此所有信息均可通过用户模式访问.不涉及特权指令.
All these registers are user-mode readable but for CR4
.
Fortunately, the bit CR4.OSXSAVE
is reflected in CPUID.1:ECX.OSXSAVE[bit 27]
and thus all information is user-mode accessible.No privileged instructions are involved.
要使用AVX扩展,必须同时支持硬件(CPUID.1:ECX.AVX
和CPUID.1:ECX.XSAVE
)和操作系统(CPUID.1:ECX.OSXSAVE
,XCR0.SSE
和XCR0.AVX
).
由于OS仅在存在硬件支持的情况下才发出对xsave
的支持信号,因此测试前者就足够了.
对于AVX扩展,仍建议测试CPUID.1:ECX.AVX
,因为即使不支持AVX,操作系统也可能设置XCR0.AVX
.
In order to use the AVX extensions both hardware (CPUID.1:ECX.AVX
and CPUID.1:ECX.XSAVE
) and OS (CPUID.1:ECX.OSXSAVE
, XCR0.SSE
and XCR0.AVX
) support must be present.
Since the OS signals its support for xsave
only in presence of the hardware support, testing the former is enough.
For the AVX extensions, testing CPUID.1:ECX.AVX
is still recommended as the OS may set XCR0.AVX
even if AVX is not supported.
这导致了英特尔官方的,强烈推荐的算法:
This leads to the Intel official, and strongly recommended, algorithm:
与您发布的完全相同.
which is the exact same one you posted.
捕获异常以检测对AVX扩展的支持也可以确保您可以确保捕获的异常为 #UD .
例如,通过执行vzeroall
,唯一可能的例外是 #UD 和 #NM .
仅在以下情况下抛出第一个:
Catching exceptions to detect the support for the AVX extensions will also do granted that you can guarantee that the exception caught is #UD.
For example, by executing vzeroall
the only possible exceptions are #UD and #NM.
The first one is thrown only when:
因此,除非您的汇编器/编译器损坏,否则它与开始时所述的条件完全相同.
So unless you have a broken assembler/compiler, it is exactly equivalent of the conditions stated at the beginning.
后者是为了保存AVX状态而进行的优化,因此,操作系统不会将其暴露给用户模式程序.
The latter is thrown as an optimisation for saving the AVX state and as such, it is not exposed to user-mode programs by the OS.
因此也可以在vzeroall
或类似位置捕获SIGILL
.
Thereby catching SIGILL
on vzeroall
or similar would also do.
这篇关于使用SIGILL进行AVX功能检测与CPU探测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!