问题描述
这是一个简单的 C++ 代码,用于比较迭代二维数组行专业和列专业.
Here is simple C++ code that compare iterating 2D array row major with column major.
#include <iostream>
#include <ctime>
using namespace std;
const int d = 10000;
int** A = new int* [d];
int main(int argc, const char * argv[]) {
for(int i = 0; i < d; ++i)
A[i] = new int [d];
clock_t ColMajor = clock();
for(int b = 0; b < d; ++b)
for(int a = 0; a < d; ++a)
A[a][b]++;
double col = static_cast<double>(clock() - ColMajor) / CLOCKS_PER_SEC;
clock_t RowMajor = clock();
for(int a = 0; a < d; ++a)
for(int b = 0; b < d; ++b)
A[a][b]++;
double row = static_cast<double>(clock() - RowMajor) / CLOCKS_PER_SEC;
cout << "Row Major : " << row;
cout << "\nColumn Major : " << col;
return 0;
}
不同d值的结果:
d = 10^3 :
行主要:0.002431
列专业:0.017186
Column Major : 0.017186
d = 10^4 :
行专业:0.237995
专栏专业:2.04471
Column Major : 2.04471
d = 10^5
行专业:53.9561
专栏专业:444.339
Column Major : 444.339
现在的问题是为什么行专业比列专业更快?
Now the question is why row major is faster than column major?
推荐答案
这显然取决于您使用的机器,但一般来说:
It obviously depends on the machine you're on but very generally speaking:
您的计算机将程序内存的一部分存储在缓存中,该缓存的延迟比主内存小得多(即使在补偿缓存命中时间时也是如此).
Your computer stores parts of your program's memory in a cache that has a much smaller latency than main memory (even when compensating for cache hit time).
C 数组按行主要顺序连续存储.这意味着如果您请求元素 x
,则元素 x+1
将存储在主存储器中紧跟在 x
存储位置之后的位置.
C arrays are stored in a contiguous by row major order. This means if you ask for element x
, then element x+1
is stored in main memory at a location directly following where x
is stored.
您的计算机缓存通常会先发制人"地用尚未使用的内存地址填充缓存,但这些地址在本地接近您的程序已使用的内存.把你的电脑想象成这样:好吧,你想要地址 X 的内存,所以我假设你很快就会想要 X+1 的内存,因此我会先发制人地为你获取它并将它放在你的缓存中".
It's typical for your computer cache to "pre-emptively" fill cache with memory addresses that haven't been used yet, but that are locally close to memory that your program has used already. Think of your computer as saying: "well, you wanted memory at address X so I am going to assume that you will shortly want memory at X+1, therefore I will pre-emptively grab that for you and place it in your cache".
当您通过行主顺序枚举数组时,您正在以这样一种方式枚举它,即它以连续的方式存储在内存中,并且您的机器已经自由地为您预先加载这些地址到缓存中因为它猜到你想要它.因此,您可以获得更高的缓存命中率.当您以另一种非连续方式枚举数组时,您的机器可能无法预测您正在应用的内存访问模式,因此它无法为您预先将内存地址拉入缓存,而您赢了不会产生那么多缓存命中,因此必须更频繁地访问主内存,这比您的缓存慢.
When you enumerate your array via row major order, you're enumerating it in such a way where it's stored in a contiguous manner in memory, and your machine has already taken the liberty of pre-loading those addresses into cache for you because it guessed that you wanted it. Therefore you achieve a higher rate of cache hits. When you're enumerating an array in another non-contiguous manner then your machine likely won't predict the memory access pattern you're applying, so it wont be able to pre-emptively pull memory addresses into cache for you, and you won't incur as many cache hits, so main memory will have to be accessed more frequently which is slower than your cache.
此外,这可能更适合 https://cs.stackexchange.com/ 因为您的系统缓存的行为方式是在硬件中实现,空间局部性问题似乎更适合那里.
Also, this might be better suited for https://cs.stackexchange.com/ because the way your system cache behaves is implemented in hardware, and spatial locality questions seem better suited there.
这篇关于为什么迭代二维数组行专业比列专业更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!