c++ - 我可以“强制” Cachegrind分析操作(或生产线)吗？

我正在对两种搜索算法的缓存行为进行基准测试，这两种算法使用Cachegrind对一系列项目进行操作。我在一个向量中有n个项目，另一个向量包含所有有效索引。我在第二个向量上使用std :: random_shuffle，然后对第一个向量中的项目执行n次成功查找。我正在基准测试的功能大致如下：

template <typename Iterator>
void lookup_in_random_order(Iterator begin, Iterator end)
{
    const std::size_t N = std::distance(begin, end);
    std::vector<std::size_t> idx(N);
    std::iota(idx.begin(), idx.end(), 0);

    std::srand(std::time(0));
    std::random_shuffle(idx.begin(), idx.end());

    // Warm the cache -- I don't care about measuring this loop.
    for(std::size_t i = 0; i < N; ++i)
        my_search(begin, end, idx[i]);

    std::random_shuffle(idx.begin(), idx.end());

    // This one I would care about!
    for(std::size_t i = 0; i < N; ++i)
    {
        int s = idx[i];
        // Especially this line, of course.
        my_search(begin, end, s);
    }
}

我用g ++（用-g和-O2）编译代码。我运行Cachegrind，然后运行cg_annotate。我得到类似以下内容：

       Ir I1mr ILmr        Dr    D1mr    DLmr Dw D1mw DLmw
        .    .    .         .       .       .  .    .    .  template <typename Iterator>
       17    2    2         0       0       0  6    0    0  void lookup_in_random_order(Iterator begin, Iterator end)
        .    .    .         .       .       .  .    .    .  {
        .    .    .         .       .       .  .    .    .      const std::size_t N = std::distance(begin, end);
        .    .    .         .       .       .  .    .    .      std::vector<std::size_t> idx(N);
        .    .    .         .       .       .  .    .    .      std::iota(idx.begin(), idx.end(), 0);
        .    .    .         .       .       .  .    .    .
        4    0    0         0       0       0  2    1    1      std::srand(std::time(0));
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .
3,145,729    0    0         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .              my_search(begin, end, idx[i]);
        .    .    .         .       .       .  .    .    .
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .
3,145,729    1    1         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .      {
1,048,575    0    0 1,048,575 132,865 131,065  0    0    0              int s = idx[i];
        .    .    .         .       .       .  .    .    .              my_search(begin, end, s);
        .    .    .         .       .       .  .    .    .      }
        7    0    0         6       1       1  0    0    0  }

由于某些原因，某些线（尤其是最有趣的线！）由点组成。现在，Cachegrind manual表示“不适用于一行的事件由点表示。这对于区分不能发生的事件和不能但不能发生的事件很有用。”

应该如何解释？我的第一个想法是，编译器可能会优化我的搜索。我以为不可能，因为程序确实花了很多时间运行。仍然，我尝试不使用-O2标志进行编译，并且在某种意义上似乎可以正常工作：现在，调用my_search的每一行都记录了一些数字（不再有点！）。但是，由于明显的原因，这似乎不是正确的方法。

总的来说，有没有一种方法可以告诉Cachegrind：“特别关注这一行，我非常有兴趣引起多少缓存未命中”？

最佳答案

我的猜测是，使用O2可使编译器对看到点的函数执行自动内联。 Cachegrind将看不到内联函数调用，因为这些调用已消失。尝试“ -fno-inline”（Compiler options）

当然，无论有没有内联，您可能会有不同的缓存性能数字。