问题描述
这个问题最初是为 此处为 SSE2.由于每个算法都与 ARMv7a+NEON 对相同操作的支持重叠,因此更新了问题以包括 ARMv7+NEON 版本.应评论者的要求,此处提出此问题以表明它确实是一个单独的主题,并提供可能对 ARMv7+NEON 更实用的替代解决方案.这些问题的最终目的是找到理想的实现以供考虑到 WebAssembly SIMD.
This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON versions. At the request of a commenter, this question is asked here to show that it is indeed a separate topic and to provide alternative solutions that might be more practical for ARMv7+NEON. The net purpose of these questions is to find ideal implementations for consideration into WebAssembly SIMD.
推荐答案
有符号 64 位饱和减法.
Signed 64-bit saturating subtract.
假设我使用 _mm_subs_epi16
的测试是正确的并转换为 1:1 到 NEON...
Assuming my tests using _mm_subs_epi16
are correct and translate to 1:1 to NEON...
uint64x2_t pcmpgtq_armv7 (int64x2_t a, int64x2_t b) {
return vreinterpretq_u64_s64(vshrq_n_s64(vqsubq_s64(b, a), 63));
}
这肯定是模拟 pcmpgtq
的最快可实现方式.
Would certainly seem to be the fastest achievable way to emulate pcmpgtq
.
黑客的乐趣给出以下公式:
// return (a > b) ? -1LL : 0LL;
int64_t cmpgt(int64_t a, int64_t b) {
return ((b & ~a) | ((b - a) & ~(b ^ a))) >> 63;
}
int64_t cmpgt(int64_t a, int64_t b) {
return ((b - a) ^ ((b ^ a) & ((b - a) ^ b))) >> 63;
}
这篇关于在 ARMv7a 和 Neon 上通过 64 位有符号比较支持 CMGT 的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!