问题描述
我需要 ARM 处理器可以执行的每个周期的操作数,尤其是 Cortex-A7、Cortex-A9 和 Cortex-A15 的操作数.我在网上找不到任何东西!
I need the number of operations per cycle that an ARM processor can execute, in particular those of Cortex-A7, Cortex-A9 and Cortex-A15.I can't find anything online!
谢谢
我需要它来计算理论峰值性能.
I need it for calculating the theoretical peak performance.
推荐答案
我还没有研究过整数,但是对于每个周期的单浮点和双浮点运算,这是我目前想到的(来自 flops-per-cycle-for-sandy-bridge-and-haswell-sse2-avx-avx2, peak-flops-per-cycle-for-arm11-and-cortex-a7-cores-in-raspberry-pi-1-and-2 和 Cortex-A9 NEON媒体处理引擎技术参考手册).
I have not looked into integers yet but for single and double floating operations per cycle this is what I have come up with so far (from flops-per-cycle-for-sandy-bridge-and-haswell-sse2-avx-avx2, peak-flops-per-cycle-for-arm11-and-cortex-a7-cores-in-raspberry-pi-1-and-2, and Cortex-A9 NEON Media Processing Engine Technical Reference Manual).
皮质-A7:
- 0.5 DP FLOPs/cycle:标量 VMLA.F64 每四个周期.
- 1.0 DP FLOPs/cycle:每个周期标量 VADD.F64.
- 2.0 SP FLOPs/cycle:每个周期标量 VMLA.F32.
- 2.0 SP FLOPs/cycle:2-wide VMLA.F32 每隔一个周期.
皮质-A9:
- 1.5 DP FLOPs/cycle:标量 VMLA.F64 + 标量 VADD.F64 每隔一个周期.
- 4.0 SP FLOPs/cycle:每个周期 2-wide VMLA.F32.
皮质-A15:
- 2.0 DP FLOPs/cycle:每个周期标量 VMLA.F64(或 VFMA.F64).
- 8.0 SP FLOPs/cycle:每个周期 4-wide VMLA.F32(或 VFMA.F32).
一个有趣的观察结果是,对于 Cortex-A7,Neon 浮点数并不比 VFP 快.
One interesting observation is that Neon floating point no faster than VFP for the Cortex-A7.
这篇关于每周期 ARM Cortex CPU 的操作数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!