问题描述
-
我有一些误解关于测量拖,基于英特尔架构,是一个失败一次加法和一次乘法在一起吗?我读到这个地方网上并没有争论,可以拒绝这个。我知道,FLOP对不同类型的CPU不同的含义。
I have some misconceptions about measuring flops, on Intel architecture, is a FLOP one addition and one multiplication together? I read about this somewhere online and there is no debate that could reject this. I know that FLOP has a different meaning on different types of cpu.
如何计算我的理论峰值FLOPS?我使用英特尔(R)酷睿(TM)2双核E7400 CPU @ 2.80GHz的。到底是什么GHz和FLOPS之间的关系? (连维基百科上关于FLOPS条目不指定如何做到这一点)
How do I calculate my theoretical peak FLOPS? I am using Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz. What exactly is the relationship between GHz and FLOPS? (even wikipedia's entry on FLOPS does NOT specify how to do this)
我会用下面的方法来衡量我的电脑的实际性能(在触发器而言):内积两个向量的:为大小为N两个向量,是触发器2N(N数 - 1)(如果加入或一次乘法被认为是1触发器)。如果不是这样,我应该怎么去计算呢?
I will be using the following methods to measure the actual performance of my computer (in terms of flops): Inner product of two vectors: for two vectors of size N, is the number of flops 2n(n -1) (if one addition or one multiplication is considered to be 1 flop). If not, how should I go about calculating this?
我知道有更好的方法来做到这一点,但我想知道我所提出的计算是否正确。我在其他地方LINPACK为基准,但我还是想知道它是如何做。
I know there better ways to do so, but I would like to know whether my proposed calculations are right. I read somewhere about LINPACK as a benchmark, but I would still like to know how it's done.
推荐答案
至于你的第二个问题,理论计算FLOPS是不是太辛苦了。它可以被分解为大致是:
As for your 2nd question, the theoretical FLOPS calculation isn't too hard. It can be broken down into roughly:
(核数)*(次/秒)*(执行单元操作/周期)*(执行单元/芯数)(花车每注册/执行单元操作)
(Number of cores) * (Number of execution units / core) * (cycles / second) * (Execution unit operations / cycle) * (floats-per-register / Execution unit operation)
一个睿2双核有2个核心,每核心1执行单元。一个SSE寄存器有128位。一个浮动为32位宽,因此您可以存储每个寄存器4浮动。我假定执行单元确实每个周期1 SSE操作。因此它应该是:
A Core-2 Duo has 2 cores, and 1 execution unit per core. an SSE register is 128 bits wide. a float is 32 bits wide so you can store 4 floats per register. I assume the execution unit does 1 SSE operation per cycle. So it should be:
2 * 1 * 2.8 * 1 * 4 = 22.4 GFLOPS
2 * 1 * 2.8 * 1 * 4 = 22.4 GFLOPS
它匹配:
http://www.intel.com/support/processors/sb/cs-023143.htm
这数显然是纯理论的最好情况下的性能。现实世界中的表现将很可能不会来接近这个由于各种各样的原因。它可能不是值得尝试直接关联触发器实际的应用程序运行时,你会更好尝试通过您的一个应用所使用的计算。
This number is obviously purely theoretical best case performance. Real world performance will most likely not come close to this due to a variety of reasons. It's probably not worth trying to directly correlate flops to actual app runtime, you'd be better off trying out the computations used by your applicaton.
这篇关于FLOPS英特尔核心,以C(innerproduct)测试它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!