问题描述
通过编码有任何(非微优化)性能增益
float f1 = 200f / 2
与
比较
float f2 = 200f * 0.5
我的教授几年前告诉我,浮点划分
这种说法适用于现代PC架构吗?
Update1
对于评论,请同时考虑这种情况:
float f1;
float f2 = 2
float f3 = 3;
for(i = 0; i {
f1 =(i * f2 + i / f3)* 0.5; //或分别除以2.0f
}
strong>
引用来自注释:
解决方案是的,许多CPU可以执行乘法1或2个时钟周期,但除法总是需要更长时间(但FP除法有时比整数除法更快)。
如果你看看你会看到,除法可以超过24个周期。
为什么除法比乘法要长得多?如果你还记得回到小学,你可能还记得乘法本质上可以同时进行多次加法。除法需要不能同时执行的迭代减法,因此它需要更长时间。事实上,一些FP单元通过执行倒数近似并乘以它来加速除法。它不是很准确,但有点快。
Is there any (non-microoptimization) performance gain by coding
float f1 = 200f / 2
in comparision to
float f2 = 200f * 0.5
A professor of mine told me a few years ago that floating point divisions were slower than floating point multiplications without elaborating the why.
Does this statement hold for modern PC architecture?
Update1
In respect to a comment, please do also consider this case:
float f1; float f2 = 2 float f3 = 3; for( i =0 ; i < 1e8; i++) { f1 = (i * f2 + i / f3) * 0.5; //or divide by 2.0f, respectively }
Update 2Quoting from the comments:
解决方案Yes, many CPUs can perform multiplication in 1 or 2 clock cycles but division always takes longer (although FP division is sometimes faster than integer division).
If you look at this answer you will see that division can exceed 24 cycles.
Why does division take so much longer than multiplication? If you remember back to grade school, you may recall that multiplication can essentially be performed with many simultaneous additions. Division requires iterative subtraction that cannot be performed simultaneously so it takes longer. In fact, some FP units speed up division by performing a reciprocal approximation and multiplying by that. It isn't quite as accurate but is somewhat faster.
这篇关于浮点乘法与浮点乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!