问题描述
我在计算机(win10)中使用Intel xeon CPU E5-1620 v3 @ 3.5Ghz进行了性能测试,获得了比raspberry pi性能更高的simitar结果。我的熊越来越等了。
我获得:
整数总和:2184 Mops(megaoperations / second)正如预期的那样
双重划分:15.6 - 18.32 Mops
双倍乘数:344 -430 Mops
双倍总和:881 - 1178 Mops
浮动部门:17.3 - 19.1 Mops
更新:我测试了I5和低于21 MOPs的部门
问题是:¿英特尔E5有协处理器吗?
我可以使用编译器指令更快地运行它吗?
它会更快地工作在I7处理器中?
我尝试过:
这个是我的代码。请在任何计算机上运行它,因为它运行得非常好!:
#include< iostream>
#include< time.h> // clock(),time_t
#pragma warning(disable:4996)//使用namespace std禁用deprecateds
;
time_t start,stop; char null_char ='\ 0';
//使用空计时器()重置开始时间:
void timer(char * title =& null_char,int data_size = 1){stop = clock(); if(* title)cout<< title<< time =<<(double)(stop-start)/(double)CLOCKS_PER_SEC<< =<< 1e-6 * data_size /((double)(stop-start)/(double)CLOCKS_PER_SEC)<< Mops / seg<< endl;开始=时钟(); }
int main()
{
cout<< 在发布模式下执行测试。在调试模式下结果将出错<< endl;
int isum = 0,size = 100 * 1024 * 1024;
timer(); // void timer重置计时器!
for(int i = 0; i< size; i ++)
isum + = i;
timer(100 Mega int sums的时间,大小);
double dsum = 1.0;
for(int i = 0; i< size; i ++)
dsum = dsum / 1.1111;
计时器(100兆双重划分时间,大小);双d2 = 1.111; dsum + = 0.1;
for(int i = 0; i< size; i ++)
dsum / = d2;
计时器(100兆双重划分时间-2,大小);
for(int i = 0; i< size; i ++)
dsum = dsum * d2;
timer(100 Mega double multiplications的时间,大小);
for(int i = 0; i< size; i ++)
dsum = dsum + d2;
timer(100 Mega sums乘法的时间,大小);
float fsum = 1.0f;
for(int i = 0; i< size; i ++)
fsum = fsum / 1.1111f;
计时器(100兆浮动分区的时间,大小);
cout<< endl<<拒绝以下行数据(在编译器优化后执行强制for循环):<< endl ;;
cout<< isum<< dsum<< fsum<< endl; //强制for()在isum上完成
cout<<=== FIN === << ENDL;的getchar();
返回1;
}
是的。所有基于x86的CPU都有一个内置的x87 FPU和矢量单元(SSE,AVX)。
是的,但它取决于编译器以及是否可以接受减少的错误处理而不是严格的IEEE兼容。为此,大多数编译器都有某种 fast-math 选项。根据使用的CPU,您还可以启用标量指令(SSE)而不是FPU。
它取决于x87 FPU / x86 CPU的时钟速率(对于SSE)。每条指令都需要一定数量的时钟周期。
浮点除法需要比加法或乘法多得多的时钟周期(与乘法相比,需要8-20倍)。这适用于所有类型的FPU,不仅适用于x86类型。当需要高性能时(例如,通过乘以循环内的倒数值),应该避免使用它们。
来自Intel®64和IA-32架构优化参考手册
I made a performance test in my computer (win10) with a Intel xeon CPU E5-1620 v3 @3.5Ghz obtaining simitar results than raspberry pi performance. My bear grew waiting.
I obtained:
integer sums:2184 Mops (megaoperations/second) as expected
double divisions:15.6 - 18.32 Mops
double multiplications:344 -430 Mops
double sums:881 - 1178 Mops
float divisions:17.3 - 19.1 Mops
Updated: I tested on a I5 and divisions where slower than 21 MOPs
The question is: ¿does intel E5 has coprocessor?
Can I use a compiler directive to run it faster?
It would work faster in a I7 processor?
What I have tried:
This is my code. Please run it in any computer as it run very well!:
#include <iostream> #include <time.h> //clock(), time_t #pragma warning(disable:4996) //disable deprecateds using namespace std; time_t start,stop;char null_char='\0'; //Use empty timer() to reset start time: void timer(char *title=&null_char,int data_size=1){ stop=clock(); if (*title) cout<<title<< " time ="<<(double) (stop-start)/(double) CLOCKS_PER_SEC<< " = " << 1e-6*data_size/( (double)(stop-start)/(double)CLOCKS_PER_SEC ) << " Mops/seg" <<endl; start=clock(); } int main() { cout << "Perform test in Release mode. Results will be wrong in debug mode" <<endl; int isum=0,size=100*1024*1024; timer();//void timer resets timer! for (int i=0;i<size;i++) isum+=i; timer("Time for 100 Mega int sums ",size); double dsum=1.0; for (int i=0;i<size;i++) dsum=dsum/1.1111; timer("Time for 100 Mega double divisions",size);double d2=1.111;dsum+=0.1; for (int i=0;i<size;i++) dsum/=d2; timer("Time for 100 Mega double divisions-2",size); for (int i=0;i<size;i++) dsum=dsum*d2; timer("Time for 100 Mega double multiplications",size); for (int i=0;i<size;i++) dsum=dsum+d2; timer("Time for 100 Mega sums multiplications",size); float fsum=1.0f; for (int i=0;i<size;i++) fsum=fsum/1.1111f; timer("Time for 100 Mega float divisions",size); cout<<endl<<" Reject following line data (done to force for loops be performed after compiler optimizations):"<<endl;; cout<<isum<<dsum<<fsum<<endl;//to force for() be done on isum cout<<"=== FIN ==="<<endl;getchar(); return 1; }
Yes. All x86 based CPUs have a build-in x87 FPU and vector units (SSE, AVX).
Yes, but it depends on the compiler and if you can accept reduced error handling and not being strict IEEE compliant. Most compilers have some kind of fast-math options for this purpose. Depending on the used CPU, you can also enable the usage of scalar instructions (SSE) instead of the FPU.
It depends on the clock rate of the x87 FPU / x86 CPU (for SSE). Each instruction requires a defined number of clock cycles.
Floating point divisions require far more clock cycles than additions or multiplications (8 - 20 times compared with multiplications). This applies to all kind of FPUs, not only to x86 types. They should be avoided when high performance is required (e.g. by multiplying with the reciprocal value within loops).
From the Intel® 64 and IA-32 Architectures Optimization Reference Manual
这篇关于英特尔I5& amp;浮动部门的速度非常慢E5至强处理器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!