c++ - 简单测试以测量缓存行大小

从本文开始-Igor Ostrovsky的Gallery of Processor Cache Effects-我想在自己的机器上玩他的示例。
这是我的第一个示例代码，着眼于触摸不同的缓存行如何影响运行时间:

#include <iostream>
#include <time.h>

using namespace std;

int main(int argc, char* argv[])
{
    int step = 1;

    const int length = 64 * 1024 * 1024;
    int* arr = new int[length];

    timespec t0, t1;
    clock_gettime(CLOCK_REALTIME, &t0);
    for (int i = 0; i < length; i += step)
        arr[i] *= 3;
    clock_gettime(CLOCK_REALTIME, &t1);

    long int duration = (t1.tv_nsec - t0.tv_nsec);
    if (duration < 0)
        duration = 1000000000 + duration;

    cout<< step << ", " << duration / 1000 << endl;

    return 0;
}

使用step的各种值，我看不到运行时间的跳跃:

step, microseconds
1, 451725
2, 334981
3, 287679
4, 261813
5, 254265
6, 246077
16, 215035
32, 207410
64, 202526
128, 197089
256, 195154

我希望看到类似的东西:

我在Ubuntu13至强X5450上对其进行了测试，并使用以下命令进行编译:g++ -O0。
我的代码有问题吗，还是结果还可以？
对于我所缺少的任何见解将不胜感激。

最佳答案

如我所见，您想观察高速缓存行大小的影响，因此我建议使用工具cachegrind，它是valgrind工具集的一部分。您的方法是正确的，但并不接近结果。

#include <iostream>
#include <time.h>
#include <stdlib.h>

using namespace std;

int main(int argc, char* argv[])
{
    int step = atoi(argv[1]);

    const int length = 64 * 1024 * 1024;
    int* arr = new int[length];

    for (int i = 0; i < length; i += step)
        arr[i] *= 3;
    return 0;
}

运行工具 valgrind --tool = cachegrind ./a.out $ cacheline-size ，您应该会看到结果。绘制此图形后，您将获得准确的预期结果。实验愉快!