问题描述
我用C ++中的 double
s的 std :: vector
编写了一个简单的高斯消去算法(gcc / Linux)。现在,我看到运行时取决于编译器的优化级别( -O3
的运行速度提高了5倍)。我编写了一个小型测试程序,并收到了类似的结果。问题不在于向量的分配,也不在于向量的大小调整等。
I have written a simple Gaussian elimination algorithm using a std::vector
of double
s in C++ (gcc / Linux). Now I have seen that the runtime depends on the optimization level of the compiler (up to 5-times faster with -O3
). I wrote a small test program and received similar results. The problem is not the allocation of the vector nor any resizing etc.
简单的事实是:
v[i] = x + y / z;
(或类似的东西)在没有优化的情况下要慢得多。我认为问题在于索引运算符。如果没有编译器优化,则 std :: vector
会比原始的 double * v
慢,但是当我打开优化时,性能是相等的,令我惊讶的是,甚至访问原始 double * v
的速度也更快。
(or something like that) is much slower without optimization. I think the problem is the index operator. Without compiler optimization, the std::vector
is slower than a raw double *v
, but when I turn on optimization, the performance is equal and, to my surprise, even the access to the raw double *v
is faster.
对此行为有解释吗?我确实不是一名专业开发人员,但是我认为编译器应该能够将上述语句转换为硬件指令,而不是直接转换为硬件指令。为什么需要启动优化,更重要的是,优化的缺点是什么? (如果没有,我想知道为什么优化不是标准的。)
Is there an explanation for this behaviour? I'm really not a professional developer, but I thought the compiler should be able to transfer statements like the above one rather directly to hardware instructions. Why is there a need to turn on an optimization and, more importantly, what is the disadvantage of the optimization? (If there is none, I wonder why the optimization is not the standard.)
这是我的矢量测试代码:
Here is my vector test code:
const long int count = 100000;
const double pi = 3.1416;
void C_array (long int size)
{
long int start = time(0);
double *x = (double*) malloc (size * sizeof(double));
for (long int n = 0; n < count; n++)
for (long int i = 0; i < size; i++)
x[i] = i;
//x[i] = pi * (i-n);
printf ("C array : %li s\n", time(0) - start);
free (x);
}
void CPP_vector (long int size)
{
long int start = time(0);
std::vector<double> x(size);
for (long int n = 0; n < count; n++)
for (long int i = 0; i < size; i++)
x[i] = i;
//x[i] = pi * (i-n);
printf ("C++ vector: %li s\n", time(0) - start);
}
int main ()
{
printf ("Size of vector: ");
long int size;
scanf ("%li", &size);
C_array (size);
CPP_vector (size);
return 0;
}
我收到了一些奇怪的结果。一个标准的g ++编译器会生成一个8 s(C数组)或18 s( std :: vector
)的运行时,其向量大小为20000。如果使用更复杂的行在 // ..
后面,运行时间为8/15 s(是,更快)。如果我打开 -O3
,则对于40,000个矢量大小,运行时间为5/5 s。
I received some weird results. A standard g++ compile produces a runtime 8 s (C array) or 18 s (std::vector
) for a vector size of 20 000. If I use the more complex line behind the //..
, the runtime is 8 / 15 s (yes, faster). If I turn on -O3
then, the runtime is 5 / 5 s for a 40,000 vector size.
推荐答案
为什么要优化/调试版本?
优化可能会完全重新排列指令顺序,消除变量,内联函数调用,并使可执行代码离源代码太远,以至于无法调试它。因此,不使用优化的原因之一是保持调试代码的可能性。当代码(当您相信代码已被完全调试)时,可以打开优化以生成发布版本。
Optimization may completely reorder the sequence of instructions, eliminate variables, inline functions calls and make the executable code so far away of the source code that you cannot debug it. So, one of the reason for not using optimization is to keep the possibility to debug the code. When your code is (when you believe your code is) fully debugged, you can turn on optimization to produce a release build.
为什么调试代码缓慢?
- 要记住的一件事是,STL的调试版本可能包含对边界和有效性的附加检查。迭代器。这会使代码减慢10倍。这是Visual C ++ STL的一个问题,已知,但是在您的情况下,您不使用它。我不知道gcc的STL的技术水平。
- 另一种可能性是,您以非线性顺序访问内存,从而导致大量缓存丢失。在调试模式下,编译器将重新剖析您的代码并生成此效率低下的代码。但是启用优化后,它可能会将您的访问顺序重写为连续的,并且不会产生任何缓存丢失。
- One thing to keep in mind is that a debug version of the STL may contain additional checks for boundaries and validity of iterators. This can slow down the code by a factor of 10. This is known to be an issue with the Visual C++ STL, but in your case you are not using it. I don't know the state of the art of the gcc's STL.
- Another possibility is that you are accessing the memory in a non linear sequence, producing lots of cache misses. When in debug mode, the compiler will respsect your code and produce this inefficient code. But when optimization is on, it may rewrite your accesses to be sequential and not produce any cache miss.
该做什么?
您可以尝试显示一个简单的可编译示例,以展示该行为。然后,我们可以编译并查看程序集以解释实际发生的情况。如果遇到缓存问题,您正在处理的数据大小 很重要。
You could try to show a simple compilable example exhibiting the behavior. We could then compile and look at the assembly to explain what is really going on. The size of the data you're processing is important if you hit a cache issue.
链接
- Visual C ++ STL在调试模式下运行缓慢:
- STL的调试版本对Visual C ++有什么作用:
- 缓存未命中及其影响:,特别是从29'27起
- 再次缓存:在36’34
- Visual C++ STL is slow in debug mode: http://marknelson.us/2011/11/28/vc-10-hash-table-performance-problems/
- What does the debug version of the STL do with Visual C++: http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Advanced-STL/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-3-of-n
- Cache miss and their impact: http://channel9.msdn.com/Events/Build/2014/2-661 , specially from 29'27"
- Cache again: https://www.youtube.com/watch?v=fHNmRkzxHWs at 36'34"
这篇关于为什么必须在g ++中启用优化才能进行简单的数组访问?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!