本文介绍了缓存行大小会影响内存访问延迟吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Intel体系结构已经有64个字节的缓存很长时间了.我很好奇,如果处理器使用64位缓存行而不是64位缓存行,这会改善RAM到寄存器数据传输的延迟吗?如果是这样,多少钱?如果没有,为什么?

Intel architecture has had 64 byte caches for a long time. I am curious, if instead of 64-byte cache lines a processor had 32-byte or 16-byte cachelines, would this improve the RAM-to-register data transfer latency? if so, how much? if not, why?

谢谢.

推荐答案

传输大量数据当然会增加通信时间.但是由于内存的组织方式,增加的幅度很小,并且确实不会影响内存的注册延迟.

Transferring a larger amount of data of course increases the communication time. But the increase is very small due the way memory are organized and it does it does not impact memory to register latency.

内存访问操作分为三个步骤:

Memory access operations are done in three steps:

  1. 位线预充电:发送行地址并且对内存的内部总线进行预充电(持续时间tRP)
  2. 行访问:读取存储器的内部行并将其写入内部锁存器.在这段时间内,发送了列地址(持续时间tRCD)
  3. 列访问:在行锁存器中读取选定的列,并开始将其发送到处理器(持续时间tCL)
  1. bitline precharge: row address is sent and the internal busses of memory are precharged (duration tRP)
  2. row access: an internal row of a memory is read and written to internal latches. During that time, column address is sent (duration tRCD)
  3. column access: the selected columns are read in the row latches and start to be sent to the processor (duration tCL)

行访问是一项长时间的操作.存储器是单元元素的矩阵.为了增加内存容量,必须使单元尽可能小.而且,当读取一行单元时,必须驱动一条非常大容量的电容,沿着存储列.电压摆幅非常低,并且有灵敏放大器放大器来检测微小的电压变化.

Row access is a long operation.A memory is a matrix of cell elements. To increase the capacity of memory, cells must be rendered as small as possible. And when reading a row of cells, one has to drive a very capacitive and large bus that goes along a memory column. The voltage swing is very low and there are sense amplifiers amplifiers to detect small voltage variations.

完成此操作后,完整的行将存储在锁存器中,读取速度很快,并且通常以突发模式发送.

Once this operation is done, a complete row is memorized in latches and reading them can be fast and are generally sent in burst mode.

考虑到具有1GHz IO周期时间的典型DDR4内存,我们通常具有tRP/tRCD/tCL = 12-15cy/12-15cy/10-12cy,整个时间约为40个内存周期(如果处理器频率为4 GHz,这是约160个处理器周期).然后,每个周期以突发模式发送数据两次,每个周期发送2x64位.因此,数据传输将为64个字节添加4个周期,而对于32个字节将仅添加2个周期.

Considering a typical DDR4 memory, with a 1GHz IO cycle time, we generally have tRP/tRCD/tCL=12-15cy/12-15cy/10-12cy and the complete time is around 40 memory cycles (if processor frequency is 4GHz, this is ~160 processor cycles). Then data is sent in burst mode twice per cycle, and 2x64 bits are sent every cycle. So, data transfer adds 4 cycles for 64 bytes and it would add only 2 cycles for 32 bytes.

因此,将缓存行从64B减少到32B可以将传输时间减少〜2/40 = 5%

So reducing cache line from 64B to 32B would reduce the transfer time by ~2/40=5%

如果行地址不变,则不需要预充电和读取存储器行,并且访问时间约为15个存储器周期.在这种情况下,传输64B与32B的时间的相对增加较大,但仍受限制:〜2/15〜15%.

If row address do not change, precharging and reading memory row is not required and the access time is ~15 memory cycles. In that case, the relative increase of time for transferring 64B vs 32B is larger but still limited: ~2/15~15%.

这两个评估都没有考虑处理内存层次结构中的未命中所需的额外时间,实际百分比甚至会更小.

Both evaluations do not take into account the extra time required to process a miss in the memory hierachy and the actual percentage will be even smaller.

数据可以由存储器先发送关键字"发送.如果处理器需要给定的字,则该字的地址将发送到内存.读取行后,内存将首先发送此字,然后发送高速缓存行中的其他字.因此,无论接收到哪个高速缓存行,高速缓存都可以在收到第一个字后立即满足处理器请求,并且减小行宽不会对高速缓存延迟产生影响.因此,如果使用此功能,则内存到寄存器的时间不会改变.

Data can be sent "critical word first" by the memory. If processor requires a given word, the address of this word is sent to memory. Once the row is read, memory sends first this word, then the other words in the cache line. So, caches can serve processor request as soon as the first word is received, whatever cache line is, and decreasing line width would have no impact on cache latency. So if using this feature, memory-to-register time would not change.

在最近的处理器中,不同缓存级别之间的交换是基于缓存行宽的,并且先发送关键字并没有带来任何收益.

In recent processors, exchanges between different caches levels are based on the cache line width and sending the critical word first does not bring any gain.

此外,由于空间的局限性,较大的行尺寸会减少强制性丢失,并且减小行尺寸会对高速缓存未命中率产生负面影响.

Besides that, large line sizes reduce mandatory misses thanks to spatial locality and reducing line size would have a negative impact on cache miss rate.

最后,使用更大的缓存行会增加缓存和内存之间的数据传输速率.

Last, using larger cache lines increases data transfer rate between cache and memory.

大的高速缓存行的唯一不利方面(除了传输时间增加很小)是高速缓存中的行数减少了,冲突未命中可能增加了.但是随着现代缓存的巨大关联性,这种影响是有限的.

The only negative aspect of large cache lines (besides the small transfer time increase) are that the number of lines in the cache is reduced and conflict misses may increase. But with the large associativity of modern caches, this effect is limited.

这篇关于缓存行大小会影响内存访问延迟吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 01:15