在嵌套循环中访问数组时缓存未命中

在嵌套循环中访问数组时缓存未命中

本文介绍了在嵌套循环中访问数组时缓存未命中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我的教授提出了这个问题,我不知道为什么 vector2 vector1 更快,缓存丢失率更低.

So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1.

假定下面的代码是有效的可编译C代码.

Assume that the code below is a valid compilable C code.

Vector2:

void incrementVector2(INT4* v, int n) {
     for (int k = 0; k < 100; ++k) {
          for (int i = 0; i < n; ++i) {
               v[i] = v[i] + 1;
          }
     }
}

向量1:

void incrementVector1(INT4* v, int n) {
     for (int i = 0; i < n; ++i) {
          for (int k = 0; k < 100; ++k) {
               v[i] = v[i] + 1;
          }
     }
}

注意::INT4表示整数为4个字节.

NOTE: INT4 means the integer is 4 Bytes in size.

根据CPU规格:缓存大小= 12288KB,行大小= 64B,并且仅考虑此单个缓存与主内存的交互.

In terms of CPU specs: cache size = 12288KB, line size=64B and only considering this single cache interacting with main memory.

问题

为什么 vector2 的运行时间比 vector1 快?为什么 vector1 会比 vector2 具有更多的高速缓存未命中?

Why does vector2 have a faster runtime than vector1? And why vector1 will have more cache misses than vector2?

我和一些同学为此工作了一段时间,无法解决.我们认为这可能是由于 vector2 具有更好的空间定位性,因为数组被内部循环不断访问,而在 vector1 中,只有一个元素是一次访问?我们不确定这是否正确,也不确定如何将缓存行引入其中.

Me and a few classmates worked on this for a while and couldn't figure it out. We thought it could be due to the fact that vector2 has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1, only one element is accessed at a time?we are not sure if this is correct, and also not sure how to bring cache lines in to this either.

推荐答案

好吧,这两个代码具有相同的访问模式,以步长为1遍历数组 v .缓存在空间局部性方面,两个代码是相同的.但是,第二个代码:

Well, both codes have the same accessing pattern, iterating over the array v with a stride of 1. Cache spacial-locality-wise both codes are the same. However, the second code:

void incrementVector1(INT4* v, int n) {
     for (int i = 0; i < n; ++i) {
          for (int k = 0; k < 100; ++k) {
               v[i] = v[i] + 1;
          }
     }
}

具有更好的时间局部性,因为您访问同一元素100次,而在以下位置:

Has a better temporal-locality because you access the same element 100 times, whereas in:

void incrementVector2(INT4* v, int n) {
     for (int k = 0; k < 100; ++k) {
          for (int i = 0; i < n; ++i) {
               v[i] = v[i] + 1;
          }
     }
}

每'n'次迭代您只能访问一次.

you only access it once on every 'n' iterations.

因此,要么您犯了一个错误,要么您的老师在玩某种奇怪的游戏,要么我错过了明显的东西.

So either you did a mistake, your teacher is playing some kind of strange game or I am missing something obvious.

这篇关于在嵌套循环中访问数组时缓存未命中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 06:23