问题描述
我总是使用双重计算,但双重提供比我需要更好的准确度(或者有意义,考虑到我做的大部分计算都是近似的开始)。
$ b $但是由于处理器已经是64位,所以我不希望使用比特率较低的类型将会有任何好处。
我是对/错,我如何优化速度(我知道较小的类型会更有效地记忆)
这里是测试
#include< cmath>
#include< ctime>
#include< cstdio>
template< typename T>
void creatematrix(int m,int n,T **& M){
M = new T * [m];
T * M_data = new T [m * n]; (int i = 0; i< m; ++ i)
{
M [i] = M_data + i * n;
}
}
void main(){
clock_t start,end;
double diffs;
const int N = 4096;
const int rep = 8;
float ** m1,** m2;
creatematrix(N,N,m1); creatematrix(N,N,m2);
start = clock(); (int k = 0; k< rep; k ++){
for(int i = 0; i< N; i ++){
for(int j = 0; j& ; j ++)
m1 [i] [j] = sqrt(m1 [i] [j] * m2 [i] [j] +0.1586);
}
}
end = clock();
diffs =(end-start)/(double)CLOCKS_PER_SEC;
printf(time =%lf\\\
,diffs);
delete [] m1 [0];
delete [] m1;
delete [] m2 [0];
delete [] m2;
getchar();
}
double和float之间没有时间差,但是平方根不是
有几种方法可以加快速度:
- 更快的I / O:您只有一半位在磁盘/内存/缓存/寄存器之间移动
- 通常只有较慢的操作是平方根和除法。例如,在Haswell上,一个
DIVSS
(float division)需要7个时钟周期,而一个DIVSD
(双分)需要8-14(来源:)。 - 如果您可以利用SIMD指令,那么您可以处理每个指令的两倍(即,在128位SSE寄存器中,您可以使用4个浮点运算,但只能运行2倍)。
- 特殊功能(
log
,sin
)可以使用较低级多项式:例如使用7级多项式,而只需要学位4. - 如果你需要更高的中间精度,你可以简单地推广
float
到double
,而对于double
,则需要,或者较慢的long double
。
请注意,这些点也适用于32位体系结构:与整数不同,没有什么特别的,格式与您的架构相匹配,即在大多数机器上,双精度与浮标相同。
I always use double to do calculations but double offers far better accuracy than I need (or makes sense, considering that most of the calculations I do are approximations to begin with).
But since the processor is already 64bit, I do not expect that using a type with less bits will be of any benefit.
Am I right/wrong, how would I optimize for speed (I understand that smaller types would be more memory efficient)
here is the test
#include <cmath>
#include <ctime>
#include <cstdio>
template<typename T>
void creatematrix(int m,int n, T **&M){
M = new T*[m];
T *M_data = new T[m*n];
for(int i=0; i< m; ++i)
{
M[i] = M_data + i * n;
}
}
void main(){
clock_t start,end;
double diffs;
const int N = 4096;
const int rep =8;
float **m1,**m2;
creatematrix(N,N,m1);creatematrix(N,N,m2);
start=clock();
for(int k = 0;k<rep;k++){
for(int i = 0;i<N;i++){
for(int j =0;j<N;j++)
m1[i][j]=sqrt(m1[i][j]*m2[i][j]+0.1586);
}
}
end = clock();
diffs = (end - start)/(double)CLOCKS_PER_SEC;
printf("time = %lf\n",diffs);
delete[] m1[0];
delete[] m1;
delete[] m2[0];
delete[] m2;
getchar();
}
there was no time difference between double and float, however when square root is not used, float is twice as fast.
There are a couple of ways they can be faster:
- Faster I/O: you have only half the bits to move between disk/memory/cache/registers
- Typically the only operations that are slower are square-root and division. As an example, on a Haswell a
DIVSS
(float division) takes 7 clock cycles, whereas aDIVSD
(double division) takes 8-14 (source: Agner Fog's tables). - If you can take advantage of SIMD instructions, then you can handle twice as many per instruction (i.e. in a 128-bit SSE register, you can operate on 4 floats, but only 2 doubles).
- Special functions (
log
,sin
) can use lower-degree polynomials: e.g. the openlibm implementation oflog
uses a degree 7 polynomial, whereaslogf
only needs degree 4. - If you need higher intermediate precision, you can simply promote
float
todouble
, whereas for adouble
you need either software double-double, or slowerlong double
.
Note that these points also hold for 32-bit architectures as well: unlike integers, there's nothing particularly special about having the size of the format match your architecture, i.e. on most machines doubles are just as "native" as floats.
这篇关于有没有什么好处,不使用双倍64位(和使用,说,浮动)处理器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!