本文介绍了Arm Neon Intrinsics 与手工组装的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

https://web.archive.org/web/20170227190422/http://hilbert-space.de/?p=22

在这个相当过时的网站上,它表明手写 asm 会比内在函数提供更大的改进.我想知道这是否是 2012 年现在的事实.

On this site which is quite dated it shows that hand written asm would give a much greater improvement then the intrinsics. I am wondering if this is the current truth even now in 2012.

那么使用 gnu 交叉编译器改进了内部函数的编译优化吗?

So has the compilation optimization improved for intrinsics using gnu cross compiler?

推荐答案

我的经验是,内在函数并不真正值得麻烦.编译器很容易在内部函数之间注入额外的寄存器卸载/加载步骤.让它停止这样做的努力比仅仅用原始 NEON 编写东西要复杂得多.我在最近的编译器(包括 clang 3.1)中看到过这种东西.

My experience is that the intrinsics haven't really been worth the trouble. It's too easy for the compiler to inject extra register unload/load steps between your intrinsics. The effort to get it to stop doing that is more complicated than just writing the stuff in raw NEON. I've seen this kind of stuff in pretty recent compilers (including clang 3.1).

在这个级别,我发现您确实需要准确控制正在发生的事情.如果您以几乎错误的顺序做事,您可以拥有各种摊位.用内在方法做这件事感觉就像戴着焊工手套做手术一样.如果代码对性能如此重要以至于我根本需要内在函数,那么内在函数就不够好.也许其他人在这里有不同的经历.

At this level, I find you really need to control exactly what's happening. You can have all kinds of stalls if you do things in just barely the wrong order. Doing it in intrinsics feels like surgery with welder's gloves on. If the code is so performance critical that I need intrinsics at all, then intrinsics aren't good enough. Maybe others have difference experiences here.

这篇关于Arm Neon Intrinsics 与手工组装的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-30 21:21