仅当循环中更新的变量为本地变量时，计算才被优化

本文介绍了仅当循环中更新的变量为本地变量时，计算才被优化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于以下功能，具有优化功能的代码被矢量化并在寄存器中执行计算（返回值在 eax 中返回）。生成的机器代码例如在这里：。

For the following function, the code with optimizations is vectorized and the computation is performed in registers (the return value is returned in eax). Generated machine code is, e.g., here: https://godbolt.org/z/VQEBV4.

int sum(int *arr, int n) {
  int ret = 0;
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

但是，如果我做 ret 全局变量（或类型为 int& 的参数），不使用向量化，并且编译器存储更新的 ret 每次迭代到内存。机器码：。

However, if I make ret variable global (or, a parameter of type int&), the vectorization is not used and the compiler stores the updated ret in each iteration to memory. Machine code: https://godbolt.org/z/NAmX4t.

int ret = 0;

int sum(int *arr, int n) {
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

我不明白为什么优化（寄存器中的矢量化/计算）为何在后一种情况下避免。没有线程，即使增量不是原子执行的。而且，这种行为似乎在编译器（GCC，Clang，Intel）之间是一致的，因此我相信一定有原因。

I don't understand why the optimizations (vectorization/computations in registers) are prevented in the latter case. There is no threading, even the increments are not performed atomically. Moreover, this behavior seems to be consistent across compilers (GCC, Clang, Intel), so I believe there must be some reason for it.

vectorization

仅当循环中更新的变量为本地变量时，计算才被优化

问题描述

推荐答案