问题描述
我在Google上搜索了很多,但无法使用 _mm_clflushopt
函数成功编译C程序。 _mm_clflush
可以正常工作,但我也希望能够尝试优化版本。我检查了cpu标志,其中包括clflushopt。我同时使用emmintrin.h和immintrin.h这两个库,但是在编译时我仍然收到对_mm_clflushopt的未定义引用错误。我在Linux终端中运行gcc -o progproc.c。
使用x86intrin.h库在编译期间给我这个错误:
I googled a lot but I could not manage to succeed in compiling C program using _mm_clflushopt
function. _mm_clflush
works fine but I want to be able to try optimized version as well. I checked in cpu flags and clflushopt is included. I am using emmintrin.h and immintrin.h both libraries but at compilation I still get "undefined reference to _mm_clflushopt" error. I am running gcc -o prog prog.c in linux terminal.Using x86intrin.h library gives me this error during the compilation:
error: inlining failed in call to always_inline '_mm_clflushopt'
我将不胜感激,尽管对于尝试查找更多信息的人来说,我确实不是本教程的新手,但是我真的无法找到优化版本的C代码。因此,我决定提出一个问题。
I would appreciate any help, I am super new to this instructions though after trying to find more information, I was not really able to find C code with optimized version. That's why I decided to ask a question.
推荐答案
GCC仅允许您使用目标CPU支持的内在函数。 GCC永远不会自行发出 clflushopt
,但是对于AVX2这样的扩展,此规则更有意义,如果您允许的话,gcc确实知道如何使用AVX2自动矢量化。而且,即使您的源使用内在函数,您也必须启用AVX2指令的使用,即使GCC允许其自身发出它们,也是如此。
GCC only lets you use intrinsics that the target CPU supports. GCC will never emit clflushopt
on its own, but this rule makes more sense for extensions like AVX2, where gcc does know how to auto-vectorize with AVX2 if you let it. And you have to enable usage of AVX2 instructions before GCC will allow itself to emit them, even if your source uses intrinsics.
使用 gcc -O3 -march = native
启用对正在编译的CPU上存在的所有扩展的使用。( -march
仍然可以在不启用优化的情况下运行,但我将其提供给将来打算复制/粘贴加粗部分的读者。)
Use gcc -O3 -march=native
to enable use of all the extensions present on the CPU you're compiling on. (-march
still works without enabling optimization, but I put it in for future readers that are going to copy/paste the bolded part.)
或-例如,march = skylake
或 -march = znver1
(Zen)可针对特定的目标CPU进行编译,而与要编译的主机无关。参见
Or -march=skylake
or -march=znver1
(Zen) for example to compile for a specific target CPU regardless of what host you're compiling on. See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
仅CLFLUSHOPT的特定选项是 -mclflushopt
,但使用 -march = skylake
也会设置 -mtune = skylake
,这也是您想要的。并启用AVX2和更早版本,FMA(是与AVX2分开的)以及BMI1 / BMI2,popcnt,RDRAND,RDSEED和许多其他功能。 (使用 -march = skylake -fverbose-asm -S
进行编译,然后查看文件顶部的asm注释,以查看所有 -m
选项已启用/未启用。)
The specific option for just CLFLUSHOPT is -mclflushopt
, but using -march=skylake
also sets -mtune=skylake
, which you also want. And enables AVX2 and earlier, FMA (yes that's separate from AVX2), and BMI1/BMI2, popcnt, RDRAND, RDSEED, and lots of other goodies. (Compile with -march=skylake -fverbose-asm -S
and look at the asm comments at the top of the file to see all the -m
options enabled / not enabled.)
这篇关于如何使用_mm_clflushopt函数编译程序?错误:内联失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!