本文介绍了错误:内联无法调用always_inline的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对某些文件实施和编码,其中一些包含SIMD调用.我已经在服务器上编译了此代码,该服务器与我的机器运行的操作系统基本上相同,但是我无法对其进行编译.

I am trying to implement and code on some files, some of which contain SIMD-calls. I have compiled this code on a server, running basically the same OS as my machine, yet i cant compile it.

这是错误:

make
g++ main.cpp -march=native -o main -fopenmp
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:53:0,
                 from tensor.hpp:9,
                 from main.cpp:4:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx512vlintrin.h: In function ‘_ZN6TensorIdE8add_avx2ERKS0_._omp_fn.5’:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx512vlintrin.h:447:1: error: inlining failed in call to always_inline ‘__m256d _mm256_mask_add_pd(__m256d, __mmask8, __m256d, __m256d)’: target specific option mismatch
 _mm256_mask_add_pd (__m256d __W, __mmask8 __U, __m256d __A,
 ^~~~~~~~~~~~~~~~~~
In file included from main.cpp:4:0:
tensor.hpp:228:33: note: called from here
         res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
               ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:53:0,
                 from tensor.hpp:9,
                 from main.cpp:4:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx512vlintrin.h:610:1: error: inlining failed in call to always_inline ‘__m256d _mm256_mask_loadu_pd(__m256d, __mmask8, const void*)’: target specific option mismatch
 _mm256_mask_loadu_pd (__m256d __W, __mmask8 __U, void const *__P)
 ^~~~~~~~~~~~~~~~~~~~
In file included from main.cpp:4:0:
tensor.hpp:228:33: note: called from here
         res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
               ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:53:0,
                 from tensor.hpp:9,
                 from main.cpp:4:
/usr/lib/gcc/x86_64-linux-gnu/7/include/avx512vlintrin.h:610:1: error: inlining failed in call to always_inline ‘__m256d _mm256_mask_loadu_pd(__m256d, __mmask8, const void*)’: target specific option mismatch
 _mm256_mask_loadu_pd (__m256d __W, __mmask8 __U, void const *__P)
 ^~~~~~~~~~~~~~~~~~~~
In file included from main.cpp:4:0:
tensor.hpp:228:33: note: called from here
         res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
               ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Makefile:7: recipe for target 'main' failed
make: *** [main] Error 1

谷歌搜索这个问题并没有真正帮助,因为所有答案都指出了一切,我已经做好/尝试过了.

Googling the problem didnt really help, as all answers pointed things out, i allready do/tried.

有人可以提供一些背景信息,以了解为什么它不起作用.

Can somebody provide some background as to why it doesn´t work.

int main(){
#ifdef __AVX512F___
    auto tt = createTensor();
    auto tt2 = createTensor();
    auto res = tt.addAVX512(tt2);
#endif
}

//This is in tensor.hpp
#ifdef __AVX512F__
Tensor<T> Tensor::addAVX512(_param_){
   res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
}
#endif

这就是发生的事情的要点...我已将所有SIMD调用都封装在#ifdefs等中.

This it the gist of what happens ... i have encased all SIMDcalls in #ifdefs, etc.

推荐答案

GCC仅允许您将内部函数用于已启用编译器使用的指令集.例如有关AVX1内在函数的一个相关问题:内联无法调用always_inline'__m256d _mm256_broadcast_sd(const double *)'

GCC will only let you use intrinsics for instruction sets that are enabled for the compiler to use. e.g. a related question about an AVX1 intrinsic: inlining failed in call to always_inline '__m256d _mm256_broadcast_sd(const double*)'

这些是256位内部函数的_mask_版本,它们需要AVX512VL.

These are _mask_ versions of 256-bit intrinsics, they require AVX512VL.

(我在关于-mavx的问题上的评论是错误的,我没有注意到名称或参数中的_mask,只是_mm256.)

(My comments under the question about -mavx were wrong, I didn't notice the _mask in the name or args, just the _mm256.)

您可能正在服务器上的KNL(骑士降落/至强皮)上进行编译,该服务器具有AVX512F但没有AVX512VL.因此-march=native将设置-mavx512f. (与具有AVX512VL的Skylake-AVX512不同,Skylake-AVX512允许使用更酷的新AVX512内容,例如带有更窄矢量的屏蔽指令.)

You're probably compiling on KNL (Knight's Landing / Xeon Phi) on your server, which has AVX512F but not AVX512VL. So -march=native will set -mavx512f. (Unlike Skylake-AVX512 which does have AVX512VL allowing use of cool new AVX512 stuff like masked instructions with narrower vectors.)

并且您在tensor.hpp中发现了一个错误,在该错误中,您仅检查了__AVX512F__而不是__AVX512VL__后才使用AVX512VL内部函数. AVX512-表示512F,所以它没有不需要同时检查两者.

And you've found a bug in your tensor.hpp, where you use AVX512VL intrinsics after only checking for __AVX512F__ instead of __AVX512VL__. AVX512-anything implies 512F, so it doesn't need to check both.

#ifdef __AVX512F__    // should be __AVX512VL__
Tensor<T> Tensor::addAVX512(_param_){
   res = _mm256_mask_add_pd(tmp, 0xFF, _mm256_mask_loadu_pd(tmp, 0xFF, &elements[i]), _mm256_mask_loadu_pd(tmp, 0xFF, &a.elements[i]));
}
#endif

这是没有意义的,如果要使用常量全一掩码,则不需要使用这些内在函数的掩码版本.像普通人一样使用_mm256_add_pd,仅检查__AVX__.或使用_mm512_add_pd.

This is just pointless, you don't need to use the masked versions of these intrinsics if you're going to use constant all-ones masks. Use _mm256_add_pd like a normal person and only check for __AVX__. Or use _mm512_add_pd.

我最初以为这是来自TensorFlow,但是(根据您的评论)没有任何意义.并不会那么糟糕. 使用完全真实的掩码将掩码合并为同一副本tmp的3个副本是没有道理的;如果编译器无法将mask = all-ones优化到未屏蔽的负载中,那么它似乎会引入错误的依赖关系,这是一种愚蠢的方法.

I thought at first this was from TensorFlow, but (from your comments) that doesn't make sense. And it can't be that badly written. Merge-masking into 3 copies of the same tmp with an all-true mask just makes no sense; it looks like a silly way to introduce a false dependency if the compiler can't optimize away the mask=all-ones into an unmasked load.

还有可怕的C ++风格:您有一个名为__m256d tmp的变量作为全局或类成员?它甚至不是局部哑变量,它可能存在于编译器无法完全对其进行优化的地方.

And also terrible C++ style: you have a variable called __m256d tmp as a global or class member?? It's not even a local dummy variable, it may exist somewhere the compiler can't fully optimize it away.

这篇关于错误:内联无法调用always_inline的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 10:04