c - 为什么Math.h中的cos功能比x86 fcos指令快

math.h中的cos（）比x86 asm fcos运行得更快。
下面的代码比较了x86 fcos和math.h中的cos（）。
在这段代码中，1000000次asm fcos花费150ms；1000000次cos（）调用只花费80ms。
fcos是如何在x86中实现的？
为什么fcos比cos（）慢得多？
我的环境是intel i7-6820HQ+win10+visual studio 2017。

#include "string"
#include "iostream"
#include<time.h>
#include "math.h"

int main()
{
  int i;
  const int i_max = 1000000;

  float c = 10000;
  float *d = &c;

  float start_value = 8.333333f;
  float* pstart_value = &start_value;
  clock_t a, b;
  a = clock();

  __asm {
    mov edx, pstart_value;

    fld [edx];
  }

  for (i = 0; i < i_max; i++) {
    __asm {
        fcos;
    }
  }


  b = clock();
  printf("asm time = %u", b - a);

  a = clock();
  double y;
  for (i = 0; i < i_max; i++) {
    start_value = cos(start_value);
  }

  b = clock();
  printf("math time = %u", b - a);
  return 0;
}

根据我个人的理解，单个asm指令通常比函数调用快。
为什么在这种情况下fcos如此缓慢？
更新：
我在另一台装有i7-6700HQ的笔记本电脑上运行了相同的代码。
在这台笔记本电脑上，100万次fco只需51毫秒，为什么两个cpu之间有这么大的差别。

最佳答案

我打赌答案很简单。您不使用cos的结果，而是像本例中那样进行了优化
https://godbolt.org/z/iw-nft
将变量更改为volatile以强制调用。
https://godbolt.org/z/9_dpMs
另一种猜测：
可能您的cos实现使用查找表。然后它将比硬件实现更快。

关于c - 为什么Math.h中的cos功能比x86 fcos指令快，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/55665744/