有没有什么好处，不使用双倍64位（和使用，说，浮动）处理器？

本文介绍了有没有什么好处，不使用双倍64位（和使用，说，浮动）处理器？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我总是使用双重计算，但双重提供比我需要更好的准确度（或者有意义，考虑到我做的大部分计算都是近似的开始）。

$ b $但是由于处理器已经是64位，所以我不希望使用比特率较低的类型将会有任何好处。

我是对/错，我如何优化速度（我知道较小的类型会更有效地记忆）

这里是测试

  #include< cmath> 
 #include< ctime> 
 #include< cstdio> 
 
 template< typename T> 
 void creatematrix（int m，int n，T **& M）{
 M = new T * [m]; 
 T * M_data = new T [m * n]; （int i = 0; i< m; ++ i）
 {
 M [i] = M_data + i * n; 
 
 
} 
} 
 
 void main（）{
 clock_t start，end; 
 double diffs; 
 const int N = 4096; 
 const int rep = 8; 
 
 float ** m1，** m2; 
 creatematrix（N，N，m1）; creatematrix（N，N，m2）; 
 
 start = clock（）; （int k = 0; k< rep; k ++）{
 for（int i = 0; i< N; i ++）{
 for（int j = 0; j& ; j ++）
 m1 [i] [j] = sqrt（m1 [i] [j] * m2 [i] [j] +0.1586）; 
} 
} 
 end = clock（）; 
 diffs =（end-start）/（double）CLOCKS_PER_SEC; 
 printf（time =％lf\\\
，diffs）; 
 
 
 delete [] m1 [0]; 
 delete [] m1; 
 
 delete [] m2 [0]; 
 delete [] m2; 
 
 getchar（）; 
}

double和float之间没有时间差，但是平方根不是

解决方案

有几种方法可以加快速度：

更快的I / O：您只有一半位在磁盘/内存/缓存/寄存器之间移动

通常只有较慢的操作是平方根和除法。例如，在Haswell上，一个 DIVSS （float division）需要7个时钟周期，而一个 DIVSD （双分）需要8-14（来源：）。

如果您可以利用SIMD指令，那么您可以处理每个指令的两倍（即，在128位SSE寄存器中，您可以使用4个浮点运算，但只能运行2倍）。

特殊功能（ log ， sin ）可以使用较低级多项式：例如使用7级多项式，而只需要学位4.

如果你需要更高的中间精度，你可以简单地推广 float 到 double ，而对于 double ，则需要，或者较慢的 long double 。

请注意，这些点也适用于32位体系结构：与整数不同，没有什么特别的，格式与您的架构相匹配，即在大多数机器上，双精度与浮标相同。

I always use double to do calculations but double offers far better accuracy than I need (or makes sense, considering that most of the calculations I do are approximations to begin with).

But since the processor is already 64bit, I do not expect that using a type with less bits will be of any benefit.

Am I right/wrong, how would I optimize for speed (I understand that smaller types would be more memory efficient)

here is the test

#include <cmath>
#include <ctime>
#include <cstdio>

template<typename T>
void creatematrix(int m,int n, T **&M){
    M = new T*[m];
    T *M_data = new T[m*n];

    for(int i=0; i< m; ++i)
    {
        M[i] = M_data + i * n;
    }
}

void main(){
    clock_t start,end;
    double diffs;
    const int N = 4096;
    const int rep =8;

    float **m1,**m2;
    creatematrix(N,N,m1);creatematrix(N,N,m2);

    start=clock();
    for(int k = 0;k<rep;k++){
        for(int i = 0;i<N;i++){
            for(int j =0;j<N;j++)
                m1[i][j]=sqrt(m1[i][j]*m2[i][j]+0.1586);
        }
    }
    end = clock();
    diffs = (end - start)/(double)CLOCKS_PER_SEC;
    printf("time = %lf\n",diffs);


    delete[] m1[0];
    delete[] m1;

    delete[] m2[0];
    delete[] m2;

    getchar();
}

there was no time difference between double and float, however when square root is not used, float is twice as fast.

解决方案

There are a couple of ways they can be faster:

Faster I/O: you have only half the bits to move between disk/memory/cache/registers
Typically the only operations that are slower are square-root and division. As an example, on a Haswell a DIVSS (float division) takes 7 clock cycles, whereas a DIVSD (double division) takes 8-14 (source: Agner Fog's tables).
If you can take advantage of SIMD instructions, then you can handle twice as many per instruction (i.e. in a 128-bit SSE register, you can operate on 4 floats, but only 2 doubles).
Special functions (log, sin) can use lower-degree polynomials: e.g. the openlibm implementation of log uses a degree 7 polynomial, whereas logf only needs degree 4.
If you need higher intermediate precision, you can simply promote float to double, whereas for a double you need either software double-double, or slower long double.

Note that these points also hold for 32-bit architectures as well: unlike integers, there's nothing particularly special about having the size of the format match your architecture, i.e. on most machines doubles are just as "native" as floats.

这篇关于有没有什么好处，不使用双倍64位（和使用，说，浮动）处理器？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！