如何以及何时要对齐缓存行大小

如何以及何时要对齐缓存行大小

本文介绍了如何以及何时要对齐缓存行大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C语言编写的梅德Vyukov的优秀界MPMC队列++
参见:http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

In Dmitry Vyukov's excellent bounded mpmc queue written in C++See: http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

他增加了一些填充变量。我presume这是使调整为业绩高速缓存行。

He adds some padding variables. I presume this is to make it align to a cache line for performance.

我有一些问题。


  1. 为什么这样做呢?

  2. 它是一个便携式方法,将
    总是工作

  3. 在什么情况下会是最好使用 __ attribute__
    ((排列(64)))
    来代替。

  4. 为什么会用业绩缓冲区指针帮助之前,填充?不只是指针加载到缓存所以真的只是一个指针的大小?

  1. Why is it done in this way?
  2. Is it a portable method that willalways work
  3. In what cases would it be best to use __attribute__((aligned (64))) instead.
  4. why would padding before a buffer pointer help with performance? isn't just the pointer loaded into the cache so it's really only the size of a pointer?

static size_t const     cacheline_size = 64;
typedef char            cacheline_pad_t [cacheline_size];

cacheline_pad_t         pad0_;
cell_t* const           buffer_;
size_t const            buffer_mask_;
cacheline_pad_t         pad1_;
std::atomic<size_t>     enqueue_pos_;
cacheline_pad_t         pad2_;
std::atomic<size_t>     dequeue_pos_;
cacheline_pad_t         pad3_;


将根据海湾合作委员会这个概念适用于C code?

Would this concept work under gcc for c code?

推荐答案

它这样做的方式,使不同的内核修改不同领域不会有反弹的缓存之间含有两者的高速缓存行。在一般情况下,对于一个处理器来访问存储器的一些数据,包含它的整个高速缓存行必须在处理器的本地高速缓存。如果它修改数据,即缓存项,通常必须在任何高速缓存中的唯一副本系统(独占模式在MESI / MOESI风格的缓存一致性协议的)。当独立内核尝试修改恰巧住在同一高速缓存行不同的数据,从而浪费时间移动该整条生产线来回,这就是被称为的假共享

It's done this way so that different cores modifying different fields won't have to bounce the cache line containing both of them between their caches. In general, for a processor to access some data in memory, the entire cache line containing it must be in that processor's local cache. If it's modifying that data, that cache entry usually must be the only copy in any cache in the system (Exclusive mode in the MESI/MOESI-style cache coherence protocols). When separate cores try to modify different data that happens to live on the same cache line, and thus waste time moving that whole line back and forth, that's known as false sharing.

在你给的特殊例子,一个核心可以入队一个条目(读(共享)缓冲_ 和写作(独家)仅 enqueue_pos_ ),而另一个出队(共享缓冲_ 和专属 dequeue_pos _ ),而不在缓存中的核心失速由所拥有的其他线路。

In the particular example you give, one core can be enqueueing an entry (reading (shared) buffer_ and writing (exclusive) only enqueue_pos_) while another dequeues (shared buffer_ and exclusive dequeue_pos_) without either core stalling on a cache line owned by the other.

一开始的填充意味着缓冲_ buffer_mask _ 结束在同一高速缓存行,而不是分裂两行,因此需要双倍的内存流量的访问。

The padding at the beginning means that buffer_ and buffer_mask_ end up on the same cache line, rather than split across two lines and thus requiring double the memory traffic to access.

我不确定该技术是否是完全便携。 的假设是,每个 cacheline_pad_t 会将自身对齐到64字节(其大小)高速缓存行边界,因此不管它后面将是下一个高速缓存行。所以,据我所知,C和C ++语言标准只要求这整个的结构,使他们能够生活在很好的阵列,不违反任何成员的对齐要求。(见注释)

I'm unsure whether the technique is entirely portable. (see comments)

属性方法将是更编译具体的,但可能会切割该结构中一半的大小,由于填充将限于每个元素舍入到一个满高速缓存行。这可能是颇有裨益如果有很多的这些。

The attribute approach would be more compiler specific, but might cut the size of this structure in half, since the padding would be limited to rounding up each element to a full cache line. That could be quite beneficial if one had a lot of these.

同样的概念也适用于使用C和C ++。

The same concept applies in C as well as C++.

这篇关于如何以及何时要对齐缓存行大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 13:36